From 98f3e8f875293f97daa95bda21f8d581b3e0973e Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Wed, 22 May 2024 22:45:41 -0700
Subject: [PATCH 01/13] dglog/langchain docs and nbs nim updates

---
 libs/ai-endpoints/README.md                   |  34 ++++-
 .../docs/chat/nvidia_ai_endpoints.ipynb       |  74 +++++----
 libs/ai-endpoints/docs/providers/nvidia.mdx   |  54 +++++--
 .../docs/retrievers/nvidia_rerank.ipynb       | 142 +++++++++++++-----
 .../text_embedding/nvidia_ai_endpoints.ipynb  |  50 ++++--
 5 files changed, 262 insertions(+), 92 deletions(-)

diff --git a/libs/ai-endpoints/README.md b/libs/ai-endpoints/README.md
index d1b2649e..f7c0e8d9 100644
--- a/libs/ai-endpoints/README.md
+++ b/libs/ai-endpoints/README.md
@@ -1,10 +1,12 @@
 # langchain-nvidia-ai-endpoints
 
-The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by the [NVIDIA AI Foundation Model](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) playground environment. 
+The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)
 
-> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to hosted endpoints for generative AI models like Llama-2, SteerLM, Mistral, etc. Using the API, you can query live endpoints available on the [NVIDIA API Catalog](https://build.nvidia.com/) to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster.
+NVIDIA AI Foundation models are community and NVIDIA-built models and are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure.  Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise.
 
-Below is an example on how to use some common functionality surrounding text-generative and embedding models
+Models can be exported from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license, and run them on-premises, giving Enterprises ownership of their customizations and full control of their IP and AI application. NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide interactive APIs for running inference on an AI Model. 
+
+Below is an example on how to use some common functionality surrounding text-generative and embedding models.
 
 ## Installation
 
@@ -15,9 +17,9 @@ Below is an example on how to use some common functionality surrounding text-gen
 ## Setup
 
 **To get started:**
-1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models
-2. Click on your model of choice
-3. Under Input select the Python tab, and click Get API Key. Then click Generate Key
+1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.
+2. Click on your model of choice.
+3. Under Input select the Python tab, and click `Get API Key`. Then click `Generate Key`.
 4. Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints.
 
 ```python
@@ -39,6 +41,26 @@ result = llm.invoke("Write a ballad about LangChain.")
 print(result.content)
 ```
 
+## Working with NVIDIA NIMs
+When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.
+
+[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)
+
+See how here [how to download and launch a NIM in your environment]()
+
+```python
+from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank
+
+# connect to an chat NIM running at localhost:8000
+llm = ChatNVIDIA(base_url="http://localhost:8000/v1")
+
+# connect to an embedding NIM running at localhost:2016
+embedder = NVIDIAEmbeddings(base_url="http://localhost:2016/v1")
+
+# connect to a reranking NIM running at localhost:1976
+ranker = NVIDIARerank(base_url="http://localhost:1976/v1")
+```
+
 ## Stream, Batch, and Async
 
 These models natively support streaming, and as is the case with all LangChain LLMs they expose a batch method to handle concurrent requests, as well as async methods for invoke, stream, and batch. Below are a few examples.
diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index ee817725..2ca4f792 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -9,16 +9,15 @@
    "source": [
     "# NVIDIA AI Foundation Endpoints\n",
     "\n",
-    "The `ChatNVIDIA` class is a LangChain chat model that connects to [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/).\n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)\n",
     "\n",
+    "NVIDIA AI Foundation models are community and NVIDIA-built models and are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure.  Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise.\n",
     "\n",
-    "> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable Diffusion, etc. These models, hosted on the [NVIDIA API catalog](https://build.nvidia.com/), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.\n",
-    "> \n",
-    "> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these models can be deployed anywhere with enterprise-grade security, stability, and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).\n",
-    "> \n",
-    "> These models can be easily accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) package, as shown below.\n",
+    "Models can be exported from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license, and run them on-premises, giving Enterprises ownership of their customizations and full control of their IP and AI application. NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide interactive APIs for running inference on an AI Model. \n",
     "\n",
-    "This example goes over how to use LangChain to interact with and develop LLM-powered systems using the publicly-accessible AI Foundation endpoints."
+    "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
+    "\n",
+    "For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
    ]
   },
   {
@@ -50,9 +49,9 @@
     "\n",
     "**To get started:**\n",
     "\n",
-    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models\n",
+    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.\n",
     "\n",
-    "2. Click on your model of choice\n",
+    "2. Click on your model of choice.\n",
     "\n",
     "3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.\n",
     "\n",
@@ -69,8 +68,11 @@
     "import getpass\n",
     "import os\n",
     "\n",
-    "if not os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
-    "    nvapi_key = getpass.getpass(\"Enter your NVIDIA API key: \")\n",
+    "# del os.environ['NVIDIA_API_KEY']  ## delete key and reset\n",
+    "if os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
+    "    print(\"Valid NVIDIA_API_KEY already in environment. Delete to reset\")\n",
+    "else:\n",
+    "    nvapi_key = getpass.getpass(\"NVAPI Key (starts with nvapi-): \")\n",
     "    assert nvapi_key.startswith(\"nvapi-\"), f\"{nvapi_key[:5]}... is not a valid key\"\n",
     "    os.environ[\"NVIDIA_API_KEY\"] = nvapi_key"
    ]
@@ -96,6 +98,32 @@
     "print(result.content)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "9d35686b",
+   "metadata": {},
+   "source": [
+    "## Working with NVIDIA NIMs\n",
+    "When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.\n",
+    "\n",
+    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
+    "\n",
+    "See how here [how to download and launch a NIM in your environment]()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "49838930",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
+    "\n",
+    "# connect to an embedding NIM running at localhost:8000\n",
+    "llm = ChatNVIDIA(base_url=\"http://localhost:8000/v1\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "71d37987-d568-4a73-9d2a-8bd86323f8bf",
@@ -334,7 +362,7 @@
    "source": [
     "## Multimodal\n",
     "\n",
-    "NVIDIA also supports multimodal inputs, meaning you can provide both images and text for the model to reason over. An example model supporting multimodal inputs is `playground_neva_22b`.\n",
+    "NVIDIA also supports multimodal inputs, meaning you can provide both images and text for the model to reason over. An example model supporting multimodal inputs is `nvidia/neva-22b`.\n",
     "\n",
     "\n",
     "These models accept LangChain's standard image formats, and accept `labels`, similar to the Steering LLMs above. In addition to `creativity`, `complexity`, and `verbosity`, these models support a `quality` toggle.\n",
@@ -367,7 +395,7 @@
    "source": [
     "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
     "\n",
-    "llm = ChatNVIDIA(model=\"playground_neva_22b\")"
+    "llm = ChatNVIDIA(model=\"nvidia/neva-22b\")"
    ]
   },
   {
@@ -500,7 +528,7 @@
    "source": [
     "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
     "\n",
-    "kosmos = ChatNVIDIA(model=\"kosmos_2\")\n",
+    "kosmos = ChatNVIDIA(model=\"microsoft/kosmos-2\")\n",
     "\n",
     "from langchain_core.messages import HumanMessage\n",
     "\n",
@@ -544,7 +572,7 @@
     "\n",
     "\n",
     "## Override the payload passthrough. Default is to pass through the payload as is.\n",
-    "kosmos = ChatNVIDIA(model=\"kosmos_2\")\n",
+    "kosmos = ChatNVIDIA(model=\"microsoft/kosmos-2\")\n",
     "kosmos.client.payload_fn = drop_streaming_key\n",
     "\n",
     "kosmos.invoke(\n",
@@ -701,16 +729,10 @@
    },
    "outputs": [],
    "source": [
-    "conversation.invoke(\"Tell me about yourself.\")[\"response\"]"
+    "conversation.invoke(\"Tell me about yourself.\")[\n",
+    "    \"response\"\n",
+    "]\n"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9a719bd3-755d-4a05-bda2-de132bf99314",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
@@ -718,9 +740,9 @@
    "provenance": []
   },
   "kernelspec": {
-   "display_name": "Python (venvoss)",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
-   "name": "venvoss"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
diff --git a/libs/ai-endpoints/docs/providers/nvidia.mdx b/libs/ai-endpoints/docs/providers/nvidia.mdx
index 5e383626..0a87a03b 100644
--- a/libs/ai-endpoints/docs/providers/nvidia.mdx
+++ b/libs/ai-endpoints/docs/providers/nvidia.mdx
@@ -1,31 +1,39 @@
 # NVIDIA
 
-> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable Diffusion, etc. These models, hosted on the [NVIDIA API Catalog](https://build.nvidia.com/), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.
-> 
-> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these models can be deployed anywhere with enterprise-grade security, stability, and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).
-> 
-> These models can be easily accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) package, as shown below.
+The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)
+
+NVIDIA AI Foundation models are community and NVIDIA-built models and are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure.  Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise.
+
+Models can be exported from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license, and run them on-premises, giving Enterprises ownership of their customizations and full control of their IP and AI application. NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide interactive APIs for running inference on an AI Model. 
+
+Below is an example on how to use some common functionality surrounding text-generative and embedding models.
 
 ## Installation
 
-```bash
-pip install -U langchain-nvidia-ai-endpoints
+```python
+pip install -U --quiet langchain-nvidia-ai-endpoints
 ```
 
 ## Setup
 
 **To get started:**
 
-1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models
+1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.
 
-2. Click on your model of choice
+2. Click on your model of choice.
 
-3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.
+3. Under Input select the Python tab, and click `Get API Key`. Then click `Generate Key`.
 
-4. Copy and save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.
+4. Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints.
 
-```bash
-export NVIDIA_API_KEY=nvapi-XXXXXXXXXXXXXXXXXXXXXXXXXX
+```python
+import getpass
+import os
+
+if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
+    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
+    assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
+    os.environ["NVIDIA_API_KEY"] = nvidia_api_key
 ```
 
 ```python
@@ -36,6 +44,26 @@ result = llm.invoke("Write a ballad about LangChain.")
 print(result.content)
 ```
 
+## Working with NVIDIA NIMs
+When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.
+
+See how here [how to download and launch a NIM in your environment]()
+
+[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)
+
+```python
+from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank
+
+# connect to an chat NIM running at localhost:8000
+llm = ChatNVIDIA(base_url="http://localhost:8000/v1")
+
+# connect to an embedding NIM running at localhost:2016
+embedder = NVIDIAEmbeddings(base_url="http://localhost:2016/v1")
+
+# connect to a reranking NIM running at localhost:1976
+ranker = NVIDIARerank(base_url="http://localhost:1976/v1")
+```
+
 ## Using NVIDIA AI Foundation Endpoints
 
 A selection of NVIDIA AI Foundation models are supported directly in LangChain with familiar APIs.
diff --git a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
index 38585759..86a5f6a9 100644
--- a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
+++ b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
@@ -1,5 +1,22 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# NVIDIA AI Foundation Endpoints \n",
+    "\n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)\n",
+    "\n",
+    "NVIDIA AI Foundation models are community and NVIDIA-built models and are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure.  Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise.\n",
+    "\n",
+    "Models can be exported from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license, and run them on-premises, giving Enterprises ownership of their customizations and full control of their IP and AI application. NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide interactive APIs for running inference on an AI Model. \n",
+    "\n",
+    "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
+    "\n",
+    "For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -13,13 +30,97 @@
     "- Enhancing accuracy for single data sources"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%pip install --upgrade --quiet  langchain-nvidia-ai-endpoints"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "**To get started:**\n",
+    "\n",
+    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.\n",
+    "\n",
+    "2. Select the `Retrieval` tab, then select your model of choice.\n",
+    "\n",
+    "3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.\n",
+    "\n",
+    "4. Copy and save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "# del os.environ['NVIDIA_API_KEY']  ## delete key and reset\n",
+    "if os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
+    "    print(\"Valid NVIDIA_API_KEY already in environment. Delete to reset\")\n",
+    "else:\n",
+    "    nvapi_key = getpass.getpass(\"NVAPI Key (starts with nvapi-): \")\n",
+    "    assert nvapi_key.startswith(\"nvapi-\"), f\"{nvapi_key[:5]}... is not a valid key\"\n",
+    "    os.environ[\"NVIDIA_API_KEY\"] = nvapi_key"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Working with NVIDIA NIMs\n",
     "\n",
-    "[ai.nvidia.com](http://ai.nvidia.com) hosts a variety of AI models accessible with an api key and the `langchain-nvidia-ai-endpoints` library. The use cases below operate in this mode by default."
+    "When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.\n",
+    "\n",
+    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
+    "\n",
+    "\n",
+    "\n",
+    "See how here [how to download and launch a NIM in your environment]()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, NVIDIARerank\n",
+    "\n",
+    "# connect to an embedding NIM running at localhost:2016\n",
+    "embedder = NVIDIAEmbeddings(base_url=\"http://localhost:2016/v1\")\n",
+    "\n",
+    "# connect to a reranking NIM running at localhost:1976\n",
+    "ranker = NVIDIARerank(base_url=\"http://localhost:1976/v1\")"
    ]
   },
   {
@@ -255,48 +356,13 @@
     "\n",
     "docs = ranker.compress_documents(query=query, documents=subset_docs)"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Working with a local NIM\n",
-    "\n",
-    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
-    "\n",
-    "The `NVIDIAEmbeddings` and `NVIDIARerank` classes give you a way to work with local NIMs through `mode` switching."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# connect to an embedding NIM running at localhost:2016\n",
-    "embedder = NVIDIAEmbeddings().mode(\"nim\", base_url=\"http://localhost:2016/v1\")\n",
-    "\n",
-    "# connect to a reranking NIM running at localhost:1976\n",
-    "ranker = NVIDIARerank().mode(\"nim\", base_url=\"http://localhost:1976/v1\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can rerun the examples above with this new `embedder` and `ranker`."
-   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python (venvoss)",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
-   "name": "venvoss"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index f8f6d46c..d2f9687c 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -8,11 +8,11 @@
    "source": [
     "# NVIDIA AI Foundation Endpoints \n",
     "\n",
-    "> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable Diffusion, etc. These models, hosted on the [NVIDIA API catalog](https://build.nvidia.com/), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.\n",
-    "> \n",
-    "> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these models can be deployed anywhere with enterprise-grade security, stability, and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).\n",
-    "> \n",
-    "> These models can be easily accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) package, as shown below.\n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)\n",
+    "\n",
+    "NVIDIA AI Foundation models are community and NVIDIA-built models and are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure.  Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise.\n",
+    "\n",
+    "Models can be exported from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license, and run them on-premises, giving Enterprises ownership of their customizations and full control of their IP and AI application. NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide interactive APIs for running inference on an AI Model. \n",
     "\n",
     "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
     "\n",
@@ -45,9 +45,9 @@
     "\n",
     "**To get started:**\n",
     "\n",
-    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models\n",
+    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.\n",
     "\n",
-    "2. Select the `Retrieval` tab, then select your model of choice\n",
+    "2. Select the `Retrieval` tab, then select your model of choice.\n",
     "\n",
     "3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.\n",
     "\n",
@@ -124,6 +124,31 @@
     "- `aembed_quey`/`embed_documents`: Asynchronous versions of the above."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with NVIDIA NIMs\n",
+    "When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.\n",
+    "\n",
+    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
+    "\n",
+    "\n",
+    "See how here [how to download and launch a NIM in your environment]()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
+    "\n",
+    "# connect to an embedding NIM running at localhost:2016\n",
+    "embedder = NVIDIAEmbeddings(base_url=\"http://localhost:2016/v1\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -471,6 +496,13 @@
     "\n",
     "chain.invoke({\"question\": \"where did harrison work\", \"language\": \"italian\"})"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
@@ -478,9 +510,9 @@
    "provenance": []
   },
   "kernelspec": {
-   "display_name": "Python (venvoss)",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
-   "name": "venvoss"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {

From 99225594abe67f4d59e2406090e02900f866db20 Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Wed, 22 May 2024 23:06:27 -0700
Subject: [PATCH 02/13] corrected typo in nims section

---
 libs/ai-endpoints/README.md                                     | 2 +-
 libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb           | 2 +-
 libs/ai-endpoints/docs/providers/nvidia.mdx                     | 2 +-
 libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb           | 2 +-
 libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/libs/ai-endpoints/README.md b/libs/ai-endpoints/README.md
index f7c0e8d9..f6ed999e 100644
--- a/libs/ai-endpoints/README.md
+++ b/libs/ai-endpoints/README.md
@@ -46,7 +46,7 @@ When ready to deploy, you can self-host models with NVIDIA NIM—which is includ
 
 [Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)
 
-See how here [how to download and launch a NIM in your environment]()
+See here [how to download and launch a NIM in your environment]()
 
 ```python
 from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank
diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index 2ca4f792..768edda3 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -108,7 +108,7 @@
     "\n",
     "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
     "\n",
-    "See how here [how to download and launch a NIM in your environment]()"
+    "See here [how to download and launch a NIM in your environment]()"
    ]
   },
   {
diff --git a/libs/ai-endpoints/docs/providers/nvidia.mdx b/libs/ai-endpoints/docs/providers/nvidia.mdx
index 0a87a03b..2e8bcb5e 100644
--- a/libs/ai-endpoints/docs/providers/nvidia.mdx
+++ b/libs/ai-endpoints/docs/providers/nvidia.mdx
@@ -47,7 +47,7 @@ print(result.content)
 ## Working with NVIDIA NIMs
 When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.
 
-See how here [how to download and launch a NIM in your environment]()
+See here [how to download and launch a NIM in your environment]()
 
 [Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)
 
diff --git a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
index 86a5f6a9..037c557f 100644
--- a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
+++ b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
@@ -101,7 +101,7 @@
     "\n",
     "\n",
     "\n",
-    "See how here [how to download and launch a NIM in your environment]()"
+    "See here [how to download and launch a NIM in your environment]()"
    ]
   },
   {
diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index d2f9687c..3421f880 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -134,7 +134,7 @@
     "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
     "\n",
     "\n",
-    "See how here [how to download and launch a NIM in your environment]()"
+    "See here [how to download and launch a NIM in your environment]()"
    ]
   },
   {

From 353454f7dc0c4fb09680a21d271c8ee18f8f127d Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Thu, 23 May 2024 13:53:28 -0700
Subject: [PATCH 03/13] removed reference to NIMs docs. they're not public atm

---
 libs/ai-endpoints/README.md                          |  2 --
 .../ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb |  4 +---
 libs/ai-endpoints/docs/providers/nvidia.mdx          |  2 --
 .../ai-endpoints/docs/retrievers/nvidia_rerank.ipynb |  6 +-----
 .../docs/text_embedding/nvidia_ai_endpoints.ipynb    | 12 +-----------
 5 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/libs/ai-endpoints/README.md b/libs/ai-endpoints/README.md
index f6ed999e..6148adf4 100644
--- a/libs/ai-endpoints/README.md
+++ b/libs/ai-endpoints/README.md
@@ -46,8 +46,6 @@ When ready to deploy, you can self-host models with NVIDIA NIM—which is includ
 
 [Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)
 
-See here [how to download and launch a NIM in your environment]()
-
 ```python
 from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank
 
diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index 768edda3..31cec783 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -106,9 +106,7 @@
     "## Working with NVIDIA NIMs\n",
     "When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.\n",
     "\n",
-    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
-    "\n",
-    "See here [how to download and launch a NIM in your environment]()"
+    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n"
    ]
   },
   {
diff --git a/libs/ai-endpoints/docs/providers/nvidia.mdx b/libs/ai-endpoints/docs/providers/nvidia.mdx
index 2e8bcb5e..3c72c461 100644
--- a/libs/ai-endpoints/docs/providers/nvidia.mdx
+++ b/libs/ai-endpoints/docs/providers/nvidia.mdx
@@ -47,8 +47,6 @@ print(result.content)
 ## Working with NVIDIA NIMs
 When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.
 
-See here [how to download and launch a NIM in your environment]()
-
 [Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)
 
 ```python
diff --git a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
index 037c557f..0d2ce9da 100644
--- a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
+++ b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
@@ -97,11 +97,7 @@
     "\n",
     "When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.\n",
     "\n",
-    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
-    "\n",
-    "\n",
-    "\n",
-    "See here [how to download and launch a NIM in your environment]()"
+    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n"
    ]
   },
   {
diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index 3421f880..454680f0 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -131,10 +131,7 @@
     "## Working with NVIDIA NIMs\n",
     "When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.\n",
     "\n",
-    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
-    "\n",
-    "\n",
-    "See here [how to download and launch a NIM in your environment]()"
+    "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n"
    ]
   },
   {
@@ -496,13 +493,6 @@
     "\n",
     "chain.invoke({\"question\": \"where did harrison work\", \"language\": \"italian\"})"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

From b01c6063660bd693fd5d184d063e4b3f370723a8 Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Thu, 23 May 2024 22:35:22 -0700
Subject: [PATCH 04/13] added wording for API Catalog

---
 libs/ai-endpoints/README.md                               | 1 +
 libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb     | 8 ++++++++
 libs/ai-endpoints/docs/providers/nvidia.mdx               | 1 +
 .../docs/text_embedding/nvidia_ai_endpoints.ipynb         | 2 +-
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/libs/ai-endpoints/README.md b/libs/ai-endpoints/README.md
index 6148adf4..e7341eee 100644
--- a/libs/ai-endpoints/README.md
+++ b/libs/ai-endpoints/README.md
@@ -32,6 +32,7 @@ if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
     os.environ["NVIDIA_API_KEY"] = nvidia_api_key
 ```
 
+## Working with NVIDIA API Catalog
 ```python
 ## Core LC Chat Interface
 from langchain_nvidia_ai_endpoints import ChatNVIDIA
diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index 31cec783..e09373d0 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -77,6 +77,14 @@
     "    os.environ[\"NVIDIA_API_KEY\"] = nvapi_key"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "af0ce26b",
+   "metadata": {},
+   "source": [
+    "## Working with NVIDIA API Catalog"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
diff --git a/libs/ai-endpoints/docs/providers/nvidia.mdx b/libs/ai-endpoints/docs/providers/nvidia.mdx
index 3c72c461..da71a781 100644
--- a/libs/ai-endpoints/docs/providers/nvidia.mdx
+++ b/libs/ai-endpoints/docs/providers/nvidia.mdx
@@ -35,6 +35,7 @@ if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
     assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
     os.environ["NVIDIA_API_KEY"] = nvidia_api_key
 ```
+## Working with NVIDIA API Catalog
 
 ```python
 from langchain_nvidia_ai_endpoints import ChatNVIDIA
diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index 454680f0..55278141 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -91,7 +91,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Initialization\n",
+    "## Working with API Catalog\n",
     "\n",
     "When initializing an embedding model you can select a model by passing it, e.g. `ai-embed-qa-4` below, or use the default by not passing any arguments."
    ]

From d4902aa1f5ab2fb8427c80cff593395ca9e0de55 Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Thu, 23 May 2024 22:37:26 -0700
Subject: [PATCH 05/13] more api catalog wording

---
 libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index 55278141..08a76d1d 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -91,7 +91,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Working with API Catalog\n",
+    "## Working with NVIDIA API Catalog\n",
     "\n",
     "When initializing an embedding model you can select a model by passing it, e.g. `ai-embed-qa-4` below, or use the default by not passing any arguments."
    ]

From 14f5127cb9d1225b59c870c1096ec1472e01049d Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Thu, 23 May 2024 23:28:22 -0700
Subject: [PATCH 06/13] changed title to NIVIDA NIMs

---
 libs/ai-endpoints/README.md                                     | 2 +-
 libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb           | 2 +-
 libs/ai-endpoints/docs/providers/nvidia.mdx                     | 2 +-
 libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb           | 2 +-
 libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/libs/ai-endpoints/README.md b/libs/ai-endpoints/README.md
index e7341eee..b8a4ca2d 100644
--- a/libs/ai-endpoints/README.md
+++ b/libs/ai-endpoints/README.md
@@ -1,4 +1,4 @@
-# langchain-nvidia-ai-endpoints
+# NVIDIA NIMs
 
 The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)
 
diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index e09373d0..66e1f87e 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -7,7 +7,7 @@
     "id": "cc6caafa"
    },
    "source": [
-    "# NVIDIA AI Foundation Endpoints\n",
+    "# NVIDIA NIMs\n",
     "\n",
     "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)\n",
     "\n",
diff --git a/libs/ai-endpoints/docs/providers/nvidia.mdx b/libs/ai-endpoints/docs/providers/nvidia.mdx
index da71a781..fc98d10a 100644
--- a/libs/ai-endpoints/docs/providers/nvidia.mdx
+++ b/libs/ai-endpoints/docs/providers/nvidia.mdx
@@ -1,4 +1,4 @@
-# NVIDIA
+# NVIDIA NIMs
 
 The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)
 
diff --git a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
index 0d2ce9da..35c552af 100644
--- a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
+++ b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# NVIDIA AI Foundation Endpoints \n",
+    "# NVIDIA NIMs \n",
     "\n",
     "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)\n",
     "\n",
diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index 08a76d1d..5722ef8e 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -6,7 +6,7 @@
     "id": "GDDVue_1cq6d"
    },
    "source": [
-    "# NVIDIA AI Foundation Endpoints \n",
+    "# NVIDIA NIMs \n",
     "\n",
     "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)\n",
     "\n",

From b5feb177f5bc2b973532b1b7301c4e2d7d97a40b Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Fri, 24 May 2024 20:51:01 -0700
Subject: [PATCH 07/13] add pip install langchain-community

---
 .../docs/text_embedding/nvidia_ai_endpoints.ipynb             | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index 5722ef8e..f8da6a8a 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -404,7 +404,7 @@
    },
    "outputs": [],
    "source": [
-    "%pip install --upgrade --quiet  langchain faiss-cpu tiktoken\n",
+    "%pip install --upgrade --quiet  langchain faiss-cpu tiktoken langchain_community\n",
     "\n",
     "from operator import itemgetter\n",
     "\n",
@@ -514,7 +514,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.10.13"
   }
  },
  "nbformat": 4,

From bcd1249b355f65910e73cad23d3686caa5cd0052 Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Fri, 24 May 2024 20:56:45 -0700
Subject: [PATCH 08/13] nemotron deprecated. removed

---
 .../docs/retrievers/nvidia_rerank.ipynb       | 112 +++++++-----------
 1 file changed, 41 insertions(+), 71 deletions(-)

diff --git a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
index 35c552af..3ab4077a 100644
--- a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
+++ b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
@@ -39,13 +39,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
     }
-   },
-   "outputs": [],
+   ],
    "source": [
     "%pip install --upgrade --quiet  langchain-nvidia-ai-endpoints"
    ]
@@ -69,12 +76,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "execution_count": 2,
+   "metadata": {},
    "outputs": [],
    "source": [
     "import getpass\n",
@@ -103,11 +106,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, NVIDIARerank\n",
@@ -139,12 +138,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "execution_count": 3,
+   "metadata": {},
    "outputs": [],
    "source": [
     "query = \"What is the meaning of life?\""
@@ -165,25 +160,28 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
     }
-   },
-   "outputs": [],
+   ],
    "source": [
     "%pip install --upgrade --quiet langchain-community elasticsearch"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "execution_count": 5,
+   "metadata": {},
    "outputs": [],
    "source": [
     "import elasticsearch\n",
@@ -198,11 +196,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "bm25_docs = bm25_retriever.invoke(query)"
@@ -220,11 +214,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "%pip install --upgrade --quiet langchain-community langchain-nvidia-ai-endpoints faiss-gpu"
@@ -233,11 +223,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "from langchain_community.vectorstores import FAISS\n",
@@ -259,11 +245,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "sem_docs = sem_retriever.get_relevant_documents(query)"
@@ -281,11 +263,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "from langchain_nvidia_ai_endpoints import NVIDIARerank\n",
@@ -309,11 +287,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "%pip install --upgrade --quiet langchain langchain-nvidia-ai-endpoints pgvector psycopg langchain-postgres"
@@ -331,11 +305,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
@@ -370,7 +340,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.10.13"
   }
  },
  "nbformat": 4,

From 28d8b167eb36afcca9da1765ff49cb770581b674 Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Fri, 24 May 2024 21:00:01 -0700
Subject: [PATCH 09/13] removed more nemotron related errors

---
 .../docs/chat/nvidia_ai_endpoints.ipynb       | 77 +------------------
 1 file changed, 1 insertion(+), 76 deletions(-)

diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index 66e1f87e..0612271b 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -286,81 +286,6 @@
     "    print(txt, end=\"\")"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "642a618a-faa3-443e-99c3-67b8142f3c51",
-   "metadata": {},
-   "source": [
-    "## Steering LLMs\n",
-    "\n",
-    "> [SteerLM-optimized models](https://developer.nvidia.com/blog/announcing-steerlm-a-simple-and-practical-technique-to-customize-llms-during-inference/) supports \"dynamic steering\" of model outputs at inference time.\n",
-    "\n",
-    "This lets you \"control\" the complexity, verbosity, and creativity of the model via integer labels on a scale from 0 to 9. Under the hood, these are passed as a special type of assistant message to the model.\n",
-    "\n",
-    "The \"steer\" models support this type of input, such as `nemotron_steerlm_8b`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "36a96b1a-e3e7-4ae3-b4b0-9331b5eca04f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
-    "\n",
-    "llm = ChatNVIDIA(model=\"nemotron_steerlm_8b\")\n",
-    "# Try making it uncreative and not verbose\n",
-    "complex_result = llm.invoke(\n",
-    "    \"What's a PB&J?\", labels={\"creativity\": 0, \"complexity\": 3, \"verbosity\": 0}\n",
-    ")\n",
-    "print(\"Un-creative\\n\")\n",
-    "print(complex_result.content)\n",
-    "\n",
-    "# Try making it very creative and verbose\n",
-    "print(\"\\n\\nCreative\\n\")\n",
-    "creative_result = llm.invoke(\n",
-    "    \"What's a PB&J?\", labels={\"creativity\": 9, \"complexity\": 3, \"verbosity\": 9}\n",
-    ")\n",
-    "print(creative_result.content)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "75849e7a-2adf-4038-8d9d-8a9e12417789",
-   "metadata": {},
-   "source": [
-    "#### Use within LCEL\n",
-    "\n",
-    "The labels are passed as invocation params. You can `bind` these to the LLM using the `bind` method on the LLM to include it within a declarative, functional chain. Below is an example."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ae1105c3-2a0c-4db3-916e-24d5e427bd01",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_core.output_parsers import StrOutputParser\n",
-    "from langchain_core.prompts import ChatPromptTemplate\n",
-    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
-    "\n",
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [(\"system\", \"You are a helpful AI assistant named Fred.\"), (\"user\", \"{input}\")]\n",
-    ")\n",
-    "chain = (\n",
-    "    prompt\n",
-    "    | ChatNVIDIA(model=\"nemotron_steerlm_8b\").bind(\n",
-    "        labels={\"creativity\": 9, \"complexity\": 0, \"verbosity\": 9}\n",
-    "    )\n",
-    "    | StrOutputParser()\n",
-    ")\n",
-    "\n",
-    "for txt in chain.stream({\"input\": \"Why is a PB&J?\"}):\n",
-    "    print(txt, end=\"\")"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "7f465ff6-5922-41d8-8abb-1d1e4095cc27",
@@ -760,7 +685,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.10.13"
   }
  },
  "nbformat": 4,

From 73f706213ef84cd55dc9dd541ad53313977c0c08 Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Fri, 24 May 2024 21:17:33 -0700
Subject: [PATCH 10/13] nits

---
 libs/ai-endpoints/README.md                   |  2 +-
 .../docs/chat/nvidia_ai_endpoints.ipynb       | 32 ++++++++---------
 .../docs/retrievers/nvidia_rerank.ipynb       | 36 ++++---------------
 .../text_embedding/nvidia_ai_endpoints.ipynb  |  4 +--
 4 files changed, 26 insertions(+), 48 deletions(-)

diff --git a/libs/ai-endpoints/README.md b/libs/ai-endpoints/README.md
index b8a4ca2d..c638781f 100644
--- a/libs/ai-endpoints/README.md
+++ b/libs/ai-endpoints/README.md
@@ -314,7 +314,7 @@ You can also connect to embeddings models through this package. Below is an exam
 ```python
 from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
 
-embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
+embedder = NVIDIAEmbeddings(model="NV-Embed-QA")
 embedder.embed_query("What's the temperature today?")
 embedder.embed_documents([
     "The temperature is 42 degrees.",
diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index 0612271b..feba0903 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -545,22 +545,22 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from langchain_core.messages import ChatMessage\n",
-    "from langchain_core.output_parsers import StrOutputParser\n",
-    "from langchain_core.prompts import ChatPromptTemplate\n",
-    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
-    "\n",
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        ChatMessage(\n",
-    "            role=\"context\", content=\"Parrots and Cats have signed the peace accord.\"\n",
-    "        ),\n",
-    "        (\"user\", \"{input}\"),\n",
-    "    ]\n",
-    ")\n",
-    "llm = ChatNVIDIA(model=\"nemotron_qa_8b\")\n",
-    "chain = prompt | llm | StrOutputParser()\n",
-    "chain.invoke({\"input\": \"What was signed?\"})"
+    "#from langchain_core.messages import ChatMessage\n",
+    "#rom langchain_core.output_parsers import StrOutputParser\n",
+    "#from langchain_core.prompts import ChatPromptTemplate\n",
+    "#from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
+    "\n",
+    "#prompt = ChatPromptTemplate.from_messages(\n",
+    "#    [\n",
+    "#        ChatMessage(\n",
+    "#            role=\"context\", content=\"Parrots and Cats have signed the peace accord.\"\n",
+    "#        ),\n",
+    "#        (\"user\", \"{input}\"),\n",
+    "#    ]\n",
+    "#)\n",
+    "#llm = ChatNVIDIA(model=\"meta/llama3-8b-instruct\")\n",
+    "#chain = prompt | llm | StrOutputParser()\n",
+    "#chain.invoke({\"input\": \"What was signed?\"})"
    ]
   },
   {
diff --git a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
index 3ab4077a..23bf88d1 100644
--- a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
+++ b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
@@ -39,20 +39,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
-      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
-      "Note: you may need to restart the kernel to use updated packages.\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%pip install --upgrade --quiet  langchain-nvidia-ai-endpoints"
    ]
@@ -76,7 +65,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -138,7 +127,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -160,27 +149,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
-      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
-      "Note: you may need to restart the kernel to use updated packages.\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%pip install --upgrade --quiet langchain-community elasticsearch"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index f8da6a8a..2b6d5b32 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -93,7 +93,7 @@
    "source": [
     "## Working with NVIDIA API Catalog\n",
     "\n",
-    "When initializing an embedding model you can select a model by passing it, e.g. `ai-embed-qa-4` below, or use the default by not passing any arguments."
+    "When initializing an embedding model you can select a model by passing it, e.g. `NV-Embed-QA` below, or use the default by not passing any arguments."
    ]
   },
   {
@@ -106,7 +106,7 @@
    "source": [
     "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
     "\n",
-    "embedder = NVIDIAEmbeddings(model=\"ai-embed-qa-4\")"
+    "embedder = NVIDIAEmbeddings(model=\"NV-Embed-QA\")"
    ]
   },
   {

From 5a6ecd000144e7cca3d65413c0e85103fe7a6067 Mon Sep 17 00:00:00 2001
From: Hayden Wolff <hwolff@nvidia.com>
Date: Tue, 28 May 2024 11:25:58 -0700
Subject: [PATCH 11/13] Hwolff/docs updates nits

---
 libs/ai-endpoints/README.md                   | 28 --------
 .../docs/chat/nvidia_ai_endpoints.ipynb       |  6 +-
 libs/ai-endpoints/docs/providers/nvidia.mdx   | 10 +--
 .../docs/retrievers/nvidia_rerank.ipynb       | 34 ++++++---
 .../text_embedding/nvidia_ai_endpoints.ipynb  | 16 ++---
 .../langchain_nvidia_ai_endpoints/_common.py  | 46 ++++++++++--
 .../langchain_nvidia_ai_endpoints/_statics.py |  6 --
 .../reranking.py                              | 59 +++++++++------
 libs/ai-endpoints/pyproject.toml              |  2 +-
 .../tests/integration_tests/conftest.py       | 12 ++--
 .../integration_tests/test_chat_models.py     | 72 +++++++++++--------
 .../integration_tests/test_embeddings.py      | 41 +++++------
 .../integration_tests/test_other_models.py    |  4 +-
 .../tests/integration_tests/test_ranking.py   | 54 ++++++++++----
 .../tests/unit_tests/test_chat_models.py      | 14 ++++
 .../tests/unit_tests/test_embeddings.py       | 10 ---
 16 files changed, 241 insertions(+), 173 deletions(-)

diff --git a/libs/ai-endpoints/README.md b/libs/ai-endpoints/README.md
index c638781f..19337d4e 100644
--- a/libs/ai-endpoints/README.md
+++ b/libs/ai-endpoints/README.md
@@ -279,34 +279,6 @@ llm.invoke(
 )
 ```
 
-## RAG: Context models
-
-NVIDIA also has Q&A models that support a special "context" chat message containing retrieved context (such as documents within a RAG chain). This is useful to avoid prompt-injecting the model.
-
-**Note:** Only "user" (human) and "context" chat messages are supported for these models, not system or AI messages useful in conversational flows.
-
-The `_qa_` models like `nemotron_qa_8b` support this.
-
-```python
-from langchain_nvidia_ai_endpoints import ChatNVIDIA
-from langchain_core.prompts import ChatPromptTemplate
-from langchain_core.output_parsers import StrOutputParser
-from langchain_core.messages import ChatMessage
-prompt = ChatPromptTemplate.from_messages(
-    [
-        ChatMessage(role="context", content="Parrots and Cats have signed the peace accord."),
-        ("user", "{input}")
-    ]
-)
-llm = ChatNVIDIA(model="nemotron_qa_8b")
-chain = (
-    prompt
-    | llm
-    | StrOutputParser()
-)
-chain.invoke({"input": "What was signed?"})
-```
-
 ## Embeddings
 
 You can also connect to embeddings models through this package. Below is an example:
diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index feba0903..b41c4e75 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -9,11 +9,11 @@
    "source": [
     "# NVIDIA NIMs\n",
     "\n",
-    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)\n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on NVIDIA NIM Inference Microservice. NIMs support models like chat, embedding, and re-ranking models from the community, partners, and NVIDIA. These models are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure and deployed as NIMs: easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure. \n",
     "\n",
-    "NVIDIA AI Foundation models are community and NVIDIA-built models and are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure.  Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise.\n",
+    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, NIMs, included with the NVIDIA AI Enterprise license, can be exported from NVIDIA’s API catalog and run on-premises or in the cloud, giving Enterprises ownership and full control of their IP and AI application. \n",
     "\n",
-    "Models can be exported from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license, and run them on-premises, giving Enterprises ownership of their customizations and full control of their IP and AI application. NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide interactive APIs for running inference on an AI Model. \n",
+    "NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide easy, consistent, and familiar APIs for running inference on an AI Model. \n",
     "\n",
     "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
     "\n",
diff --git a/libs/ai-endpoints/docs/providers/nvidia.mdx b/libs/ai-endpoints/docs/providers/nvidia.mdx
index fc98d10a..4d950a0e 100644
--- a/libs/ai-endpoints/docs/providers/nvidia.mdx
+++ b/libs/ai-endpoints/docs/providers/nvidia.mdx
@@ -1,10 +1,10 @@
-# NVIDIA NIMs
+# NVIDIA
 
-The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)
+The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on NVIDIA NIM Inference Microservice. NIMs support models like chat, embedding, and re-ranking models from the community,  partners, and NVIDIA. These models are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure and deployed as NIMs: easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure. 
 
-NVIDIA AI Foundation models are community and NVIDIA-built models and are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure.  Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise.
+NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, NIMs, included with the NVIDIA AI Enterprise license, can be exported from NVIDIA’s API catalog and run on-premises or in the cloud, giving Enterprises ownership and full control of their IP and AI application. 
 
-Models can be exported from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license, and run them on-premises, giving Enterprises ownership of their customizations and full control of their IP and AI application. NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide interactive APIs for running inference on an AI Model. 
+NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide easy, consistent, and familiar APIs for running inference on an AI Model. 
 
 Below is an example on how to use some common functionality surrounding text-generative and embedding models.
 
@@ -45,6 +45,8 @@ result = llm.invoke("Write a ballad about LangChain.")
 print(result.content)
 ```
 
+Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise, shown in the next section [Working with NVIDIA NIMs](##working-with-nvidia-nims). 
+
 ## Working with NVIDIA NIMs
 When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.
 
diff --git a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
index 23bf88d1..62706e9d 100644
--- a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
+++ b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
@@ -4,17 +4,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# NVIDIA NIMs \n",
+    "# NVIDIA NIMs\n",
     "\n",
-    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)\n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on NVIDIA NIM Inference Microservice. NIMs support models like chat, embedding, and re-ranking models from the community, partners, and NVIDIA. These models are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure and deployed as NIMs: easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure. \n",
     "\n",
-    "NVIDIA AI Foundation models are community and NVIDIA-built models and are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure.  Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise.\n",
+    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, NIMs, included with the NVIDIA AI Enterprise license, can be exported from NVIDIA’s API catalog and run on-premises or in the cloud, giving Enterprises ownership and full control of their IP and AI application. \n",
     "\n",
-    "Models can be exported from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license, and run them on-premises, giving Enterprises ownership of their customizations and full control of their IP and AI application. NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide interactive APIs for running inference on an AI Model. \n",
+    "NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide easy, consistent, and familiar APIs for running inference on an AI Model. \n",
     "\n",
-    "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
+    "This example goes over how to use LangChain to interact with a NIM for a re-ranking model as well as a NIM for embeddings via LangChain's `NVIDIARerank` and `NVIDIAEmbeddings` classes. The example demonstrates how a re-ranking model can be used to combine retrieval results and improve accuracy during retrieval of documents.\n",
     "\n",
-    "For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
+    "For more information on accessing the chat models through this API, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
    ]
   },
   {
@@ -113,16 +113,16 @@
    "source": [
     "### Combining results from multiple sources\n",
     "\n",
-    "Consider a pipeline with data from a semantic store, such as FAISS, as well as a BM25 store.\n",
+    "Consider a pipeline with data from a BM25 store as well as a semantic store, such as FAISS.  \n",
     "\n",
-    "Each store is queried independently and returns results that the individual store considers to be highly relevant. Figuring out the overall relevance of the results is where reranking comes into play."
+    "Each store is queried independently and returns results that the individual store considers to be highly relevant. Figuring out the overall relevance of the results is where re-ranking comes into play."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We will search for information about the query `What is the meaning of life?` across a BM25 store and semantic store."
+    "We will search for information about the query `What is the meaning of life?` across a both a BM25 store and semantic store."
    ]
   },
   {
@@ -171,6 +171,13 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Return the relevant documents from the query `\"What is the meaning of life?\"` with the BM25 retriever."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -220,6 +227,13 @@
     "                                 allow_dangerous_deserialization=allow_dangerous_deserialization).as_retriever()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Return the relevant documents from the query `\"What is the meaning of life?\"` with FAISS semantic store."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -235,7 +249,7 @@
    "source": [
     "#### Combine and rank documents\n",
     "\n",
-    "The resulting `docs` will be ordered by their relevance to the query."
+    "Let's combine the BM25 as well as semantic search results. The resulting `docs` will be ordered by their relevance to the query by the reranking NIM."
    ]
   },
   {
diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index 2b6d5b32..ee947e86 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -8,15 +8,15 @@
    "source": [
     "# NVIDIA NIMs \n",
     "\n",
-    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations for chat models and embeddings powered by [NVIDIA AI Foundation Models](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), and hosted on [NVIDIA API Catalog.](https://build.nvidia.com/)\n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on NVIDIA NIM Inference Microservice. NIMs support models like chat, embedding, and re-ranking models from the community,  partners, and NVIDIA. These models are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure and deployed as NIMs: easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure. \n",
     "\n",
-    "NVIDIA AI Foundation models are community and NVIDIA-built models and are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure.  Using the API, you can query live endpoints available on the NVIDIA API Catalog to get quick results from a DGX-hosted cloud compute environment. All models are source-accessible and can be deployed on your own compute cluster using NVIDIA NIM which is part of NVIDIA AI Enterprise.\n",
+    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, NIMs, included with the NVIDIA AI Enterprise license, can be exported from NVIDIA’s API catalog and run on-premises or in the cloud, giving Enterprises ownership and full control of their IP and AI application. \n",
     "\n",
-    "Models can be exported from NVIDIA’s API catalog with NVIDIA NIM, which is included with the NVIDIA AI Enterprise license, and run them on-premises, giving Enterprises ownership of their customizations and full control of their IP and AI application. NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide interactive APIs for running inference on an AI Model. \n",
+    "NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide easy, consistent, and familiar APIs for running inference on an AI Model. \n",
     "\n",
     "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
     "\n",
-    "For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
+    "For more information on accessing the chat models through this API, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
    ]
   },
   {
@@ -84,14 +84,14 @@
     "id": "l185et2kc8pS"
    },
    "source": [
-    "We should be able to see an embedding model among that list which can be used in conjunction with an LLM for effective RAG solutions. We can interface with this model pretty easily with the help of the `NVIDIAEmbeddings` model."
+    "We should be able to see an embedding model among that list which can be used in conjunction with an LLM for effective RAG solutions. We can interface with this model as well as other embedding models supported by NIM through the `NVIDIAEmbeddings` class."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Working with NVIDIA API Catalog\n",
+    "## Working with NIMs on the NVIDIA API Catalog\n",
     "\n",
     "When initializing an embedding model you can select a model by passing it, e.g. `NV-Embed-QA` below, or use the default by not passing any arguments."
    ]
@@ -121,14 +121,14 @@
     "\n",
     "- `embed_documents`: Generate passage embeddings for a list of documents which you would like to search over.\n",
     "\n",
-    "- `aembed_quey`/`embed_documents`: Asynchronous versions of the above."
+    "- `aembed_query`/`aembed_documents`: Asynchronous versions of the above."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Working with NVIDIA NIMs\n",
+    "## Working with self-hosted NVIDIA NIMs\n",
     "When ready to deploy, you can self-host models with NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—and run them anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.\n",
     "\n",
     "[Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n"
diff --git a/libs/ai-endpoints/langchain_nvidia_ai_endpoints/_common.py b/libs/ai-endpoints/langchain_nvidia_ai_endpoints/_common.py
index 42086d0d..35074251 100644
--- a/libs/ai-endpoints/langchain_nvidia_ai_endpoints/_common.py
+++ b/libs/ai-endpoints/langchain_nvidia_ai_endpoints/_common.py
@@ -19,16 +19,18 @@
     Tuple,
     Union,
 )
+from urllib.parse import urlparse
 
 import aiohttp
 import requests
-from langchain_core._api import deprecated
+from langchain_core._api import deprecated, warn_deprecated
 from langchain_core.pydantic_v1 import (
     BaseModel,
     Field,
     PrivateAttr,
     SecretStr,
     root_validator,
+    validator,
 )
 from requests.models import Response
 
@@ -113,6 +115,17 @@ def headers(self) -> dict:
                 )
         return headers_
 
+    @validator("base_url")
+    def validate_base_url(cls, v: str) -> str:
+        if v is not None:
+            result = urlparse(v)
+            # Ensure scheme and netloc (domain name) are present
+            if not (result.scheme and result.netloc):
+                raise ValueError(
+                    f"Invalid base_url, minimally needs scheme and netloc: {v}"
+                )
+        return v
+
     @root_validator(pre=True)
     def validate_model(cls, values: Dict[str, Any]) -> Dict[str, Any]:
         """Validate and update model arguments, including API key and formatting"""
@@ -534,7 +547,16 @@ class _NVIDIAClient(BaseModel):
     _default_model: str = ""
     model: str = Field(description="Name of the model to invoke")
     infer_endpoint: str = Field("{base_url}/chat/completions")
-    curr_mode: _MODE_TYPE = Field("nvidia")
+    curr_mode: _MODE_TYPE = Field("nvidia")  # todo: remove this in 0.1
+    is_hosted: bool = Field(True)
+
+    def __init__(self, **kwargs: Any):
+        super().__init__(**kwargs)
+        if "base_url" in kwargs:
+            self.is_hosted = False
+            self.curr_mode = "nim"
+            self.client.endpoints["infer"] = self.infer_endpoint
+            self.client.endpoints["models"] = "{base_url}/models"
 
     ####################################################################################
 
@@ -595,9 +617,9 @@ def available_functions(self) -> List[dict]:
     @property
     def available_models(self) -> List[Model]:
         """Map the available models that can be invoked."""
-        if self.curr_mode == "nim":
+        if self.curr_mode == "nim" or not self.is_hosted:
             return self.__class__.get_available_models(
-                client=self, mode="nim", base_url=self.client.base_url
+                client=self, base_url=self.client.base_url
             )
         else:
             return self.__class__.get_available_models(client=self)
@@ -625,7 +647,12 @@ def get_available_models(
         **kwargs: Any,
     ) -> List[Model]:
         """Map the available models that can be invoked. Callable from class"""
-        nveclient = (client or cls(**kwargs)).mode(mode, **kwargs).client
+        if mode is not None:
+            warn_deprecated(
+                name="mode", since="0.0.17", removal="0.1.0", alternative="`base_url`"
+            )
+        self = client or cls(**kwargs)
+        nveclient = self.client
         nveclient.reset_method_cache()
         out = sorted(
             [
@@ -637,7 +664,7 @@ def get_available_models(
         # nim model listing does not provide the type and we cannot know
         # the model name ahead of time to guess the type.
         # so we need to list all models.
-        if mode == "nim":
+        if mode == "nim" or not self.is_hosted:
             list_all = True
         if not filter:
             filter = cls.__name__
@@ -668,6 +695,11 @@ def get_binding_model(self) -> Optional[str]:
             return ""
         return self.model
 
+    @deprecated(
+        since="0.0.17",
+        removal="0.1.0",
+        alternative="`base_url` in constructor",
+    )
     def mode(
         self,
         mode: Optional[_MODE_TYPE] = "nvidia",
@@ -680,7 +712,7 @@ def mode(
         force_clone: bool = True,
         **kwargs: Any,
     ) -> Any:  # todo: in python 3.11+ this should be typing.Self
-        """Return a client swapped to a different mode"""
+        """Deprecated: pass `base_url=...` to constructor instead."""
         if isinstance(self, str):
             raise ValueError("Please construct the model before calling mode()")
         out = self if not force_clone else deepcopy(self)
diff --git a/libs/ai-endpoints/langchain_nvidia_ai_endpoints/_statics.py b/libs/ai-endpoints/langchain_nvidia_ai_endpoints/_statics.py
index 963ea49e..afc6434c 100644
--- a/libs/ai-endpoints/langchain_nvidia_ai_endpoints/_statics.py
+++ b/libs/ai-endpoints/langchain_nvidia_ai_endpoints/_statics.py
@@ -24,12 +24,6 @@ class Model(BaseModel):
         "api_type": "aifm",
         "alternative": "meta/llama2-70b",
     },
-    "playground_nvolveqa_40k": {
-        "model_type": "embedding",
-        "api_type": "aifm",
-        "alternative": "NV-Embed-QA",
-    },
-    "playground_nemotron_qa_8b": {"model_type": "qa", "api_type": "aifm"},
     "playground_gemma_7b": {
         "model_type": "chat",
         "api_type": "aifm",
diff --git a/libs/ai-endpoints/langchain_nvidia_ai_endpoints/reranking.py b/libs/ai-endpoints/langchain_nvidia_ai_endpoints/reranking.py
index 3e78ef6d..d5f3d283 100644
--- a/libs/ai-endpoints/langchain_nvidia_ai_endpoints/reranking.py
+++ b/libs/ai-endpoints/langchain_nvidia_ai_endpoints/reranking.py
@@ -2,6 +2,7 @@
 
 from typing import Any, Generator, List, Optional, Sequence
 
+from langchain_core._api import deprecated, warn_deprecated
 from langchain_core.callbacks.manager import Callbacks
 from langchain_core.documents import Document
 from langchain_core.documents.compressor import BaseDocumentCompressor
@@ -37,30 +38,50 @@ class Config:
     max_batch_size: int = Field(
         _default_batch_size, ge=1, description="The maximum batch size."
     )
+    _is_hosted: bool = PrivateAttr(True)
 
     def __init__(self, **kwargs: Any):
         """
         Create a new NVIDIARerank document compressor.
 
-        Unless you plan to use the "nim" mode, you need to provide an API key. Your
-        options are -
-         0. Pass the key as the nvidia_api_key parameter.
-         1. Pass the key as the api_key parameter.
-         2. Set the NVIDIA_API_KEY environment variable, recommended.
-        Precedence is in the order listed above.
+        This class provides access to a NVIDIA NIM for reranking. By default, it
+        connects to a hosted NIM, but can be configured to connect to a local NIM
+        using the `base_url` parameter. An API key is required to connect to the
+        hosted NIM.
+
+        Args:
+            model (str): The model to use for reranking.
+            nvidia_api_key (str): The API key to use for connecting to the hosted NIM.
+            api_key (str): Alternative to nvidia_api_key.
+            base_url (str): The base URL of the NIM to connect to.
+
+        API Key:
+        - The recommended way to provide the API key is through the `NVIDIA_API_KEY`
+            environment variable.
         """
         super().__init__(**kwargs)
         self._client = _NVIDIAClient(
             model=self.model,
             api_key=kwargs.get("nvidia_api_key", kwargs.get("api_key", None)),
         )
+        if base_url := kwargs.get("base_url", None):
+            # todo: detect if the base_url points to hosted NIM, this depends on
+            #       moving from NVCF inference to API Catalog inference
+            self._is_hosted = False
+            self._client.client.base_url = base_url
+            self._client.client.endpoints["infer"] = "{base_url}/ranking"
+            self._client.client.endpoints = {
+                "infer": "{base_url}/ranking",
+                "status": None,
+                "models": None,
+            }
 
     @property
     def available_models(self) -> List[Model]:
         """
         Get a list of available models that work with NVIDIARerank.
         """
-        if self._client.curr_mode == "nim":
+        if self._client.curr_mode == "nim" or not self._is_hosted:
             # local NIM supports a single model and no /models endpoint
             models = [
                 Model(
@@ -102,8 +123,10 @@ def get_available_models(
         It is possible to get a list of all models, including those that are not
         chat models, by setting the list_all parameter to True.
         """
+        if mode is not None:
+            warn_deprecated(since="0.0.17", removal="0.1.0", alternative="`base_url`")
         self = cls(**kwargs).mode(mode=mode, **kwargs)
-        if mode == "nim":
+        if mode == "nim" or not self._is_hosted:
             # ignoring list_all because there is one
             models = self.available_models
         else:
@@ -116,6 +139,11 @@ def get_available_models(
             )
         return models
 
+    @deprecated(
+        since="0.0.17",
+        removal="0.1.0",
+        alternative="`base_url` to constructor",
+    )
     def mode(
         self,
         mode: Optional[_MODE_TYPE] = "nvidia",
@@ -125,20 +153,7 @@ def mode(
         **kwargs: Any,
     ) -> NVIDIARerank:
         """
-        Change the mode.
-
-        There are two modes, "nvidia" and "nim". The "nvidia" mode is the default mode
-        and is used to interact with hosted NVIDIA AI endpoints. The "nim" mode is
-        used to interact with NVIDIA NIM endpoints, which are typically hosted
-        on-premises.
-
-        For the "nvidia" mode, the "api_key" parameter is available to specify your
-        API key. If not specified, the NVIDIA_API_KEY environment variable will be used.
-
-        For the "nim" mode, the "base_url" and "model" parameters are required. Set
-        base_url to the url of your NVIDIA NIM endpoint. For instance,
-        "https://localhost:9999/v1". Additionally, the "model" parameter must be set
-        to the name of the model inside the NIM.
+        Deprecated: use NVIDIARerank(base_url=...) instead.
         """
         # set a default base_url for nim mode
         if not base_url and mode == "nim":
diff --git a/libs/ai-endpoints/pyproject.toml b/libs/ai-endpoints/pyproject.toml
index e1cad2b6..69b3b654 100644
--- a/libs/ai-endpoints/pyproject.toml
+++ b/libs/ai-endpoints/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "langchain-nvidia-ai-endpoints"
-version = "0.0.17"
+version = "0.0.18"
 description = "An integration package connecting NVIDIA AI Endpoints and LangChain"
 authors = []
 readme = "README.md"
diff --git a/libs/ai-endpoints/tests/integration_tests/conftest.py b/libs/ai-endpoints/tests/integration_tests/conftest.py
index 5ae928ad..98dde45a 100644
--- a/libs/ai-endpoints/tests/integration_tests/conftest.py
+++ b/libs/ai-endpoints/tests/integration_tests/conftest.py
@@ -9,7 +9,7 @@
 def get_mode(config: pytest.Config) -> dict:
     nim_endpoint = config.getoption("--nim-endpoint")
     if nim_endpoint:
-        return dict(mode="nim", base_url=nim_endpoint)
+        return dict(base_url=nim_endpoint)
     return {}
 
 
@@ -50,14 +50,14 @@ def pytest_generate_tests(metafunc: pytest.Metafunc) -> None:
     mode = get_mode(metafunc.config)
 
     def get_all_models() -> List[Model]:
-        return ChatNVIDIA().mode(**mode).get_available_models(list_all=True, **mode)
+        return ChatNVIDIA.get_available_models(list_all=True, **mode)
 
     if "chat_model" in metafunc.fixturenames:
         models = [ChatNVIDIA._default_model]
         if model := metafunc.config.getoption("chat_model_id"):
             models = [model]
         if metafunc.config.getoption("all_models"):
-            models = [model.id for model in ChatNVIDIA().mode(**mode).available_models]
+            models = [model.id for model in ChatNVIDIA(**mode).available_models]
         metafunc.parametrize("chat_model", models, ids=models)
 
     if "rerank_model" in metafunc.fixturenames:
@@ -67,9 +67,7 @@ def get_all_models() -> List[Model]:
         # nim-mode reranking does not support model listing via /v1/models endpoint
         if metafunc.config.getoption("all_models"):
             if mode.get("mode", None) == "nim":
-                models = [
-                    model.id for model in NVIDIARerank().mode(**mode).available_models
-                ]
+                models = [model.id for model in NVIDIARerank(**mode).available_models]
             else:
                 models = [
                     model.id
@@ -89,7 +87,7 @@ def get_all_models() -> List[Model]:
         metafunc.parametrize("image_in_model", models, ids=models)
 
     if "qa_model" in metafunc.fixturenames:
-        models = ["nemotron_qa_8b"]
+        models = []
         if metafunc.config.getoption("all_models"):
             models = [
                 model.id for model in get_all_models() if model.model_type == "qa"
diff --git a/libs/ai-endpoints/tests/integration_tests/test_chat_models.py b/libs/ai-endpoints/tests/integration_tests/test_chat_models.py
index f52bec98..29e76544 100644
--- a/libs/ai-endpoints/tests/integration_tests/test_chat_models.py
+++ b/libs/ai-endpoints/tests/integration_tests/test_chat_models.py
@@ -3,6 +3,7 @@
 from typing import List
 
 import pytest
+from langchain_core._api import LangChainDeprecationWarning
 from langchain_core.load.dump import dumps
 from langchain_core.load.load import loads
 from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage
@@ -21,10 +22,20 @@
 
 def test_chat_ai_endpoints(chat_model: str, mode: dict) -> None:
     """Test ChatNVIDIA wrapper."""
-    chat = ChatNVIDIA(
-        model=chat_model,
-        temperature=0.7,
-    ).mode(**mode)
+    chat = ChatNVIDIA(model=chat_model, temperature=0.7, **mode)
+    message = HumanMessage(content="Hello")
+    response = chat.invoke([message])
+    assert isinstance(response, BaseMessage)
+    assert isinstance(response.content, str)
+
+
+def test_chat_ai_endpoints_deprecated(chat_model: str, mode: dict) -> None:
+    """Test ChatNVIDIA wrapper."""
+    with pytest.warns(LangChainDeprecationWarning):
+        chat = ChatNVIDIA(
+            model=chat_model,
+            temperature=0.7,
+        ).mode(**mode)
     message = HumanMessage(content="Hello")
     response = chat.invoke([message])
     assert isinstance(response, BaseMessage)
@@ -47,7 +58,7 @@ def test_chat_ai_endpoints_system_message(chat_model: str, mode: dict) -> None:
     if chat_model == "mamba_chat":
         pytest.skip(f"{chat_model} does not support system messages")
 
-    chat = ChatNVIDIA(model=chat_model, max_tokens=36).mode(**mode)
+    chat = ChatNVIDIA(model=chat_model, max_tokens=36, **mode)
     system_message = SystemMessage(content="You are to chat with the user.")
     human_message = HumanMessage(content="Hello")
     response = chat.invoke([system_message, human_message])
@@ -125,7 +136,7 @@ def test_messages(
 ) -> None:
     if not system and not exchange:
         pytest.skip("No messages to test")
-    chat = ChatNVIDIA(model=chat_model, max_tokens=36).mode(**mode)
+    chat = ChatNVIDIA(model=chat_model, max_tokens=36, **mode)
     response = chat.invoke(system + exchange)
     assert isinstance(response, BaseMessage)
     assert response.response_metadata["role"] == "assistant"
@@ -137,7 +148,7 @@ def test_messages(
 
 def test_ai_endpoints_streaming(chat_model: str, mode: dict) -> None:
     """Test streaming tokens from ai endpoints."""
-    llm = ChatNVIDIA(model=chat_model, max_tokens=36).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, max_tokens=36, **mode)
 
     cnt = 0
     for token in llm.stream("I'm Pickle Rick"):
@@ -148,7 +159,7 @@ def test_ai_endpoints_streaming(chat_model: str, mode: dict) -> None:
 
 async def test_ai_endpoints_astream(chat_model: str, mode: dict) -> None:
     """Test streaming tokens from ai endpoints."""
-    llm = ChatNVIDIA(model=chat_model, max_tokens=35).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, max_tokens=35, **mode)
 
     cnt = 0
     async for token in llm.astream("I'm Pickle Rick"):
@@ -159,7 +170,7 @@ async def test_ai_endpoints_astream(chat_model: str, mode: dict) -> None:
 
 async def test_ai_endpoints_abatch(chat_model: str, mode: dict) -> None:
     """Test streaming tokens."""
-    llm = ChatNVIDIA(model=chat_model, max_tokens=36).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, max_tokens=36, **mode)
 
     result = await llm.abatch(["I'm Pickle Rick", "I'm not Pickle Rick"])
     for token in result:
@@ -168,7 +179,7 @@ async def test_ai_endpoints_abatch(chat_model: str, mode: dict) -> None:
 
 async def test_ai_endpoints_abatch_tags(chat_model: str, mode: dict) -> None:
     """Test batch tokens."""
-    llm = ChatNVIDIA(model=chat_model, max_tokens=55).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, max_tokens=55, **mode)
 
     result = await llm.abatch(
         ["I'm Pickle Rick", "I'm not Pickle Rick"], config={"tags": ["foo"]}
@@ -179,7 +190,7 @@ async def test_ai_endpoints_abatch_tags(chat_model: str, mode: dict) -> None:
 
 def test_ai_endpoints_batch(chat_model: str, mode: dict) -> None:
     """Test batch tokens."""
-    llm = ChatNVIDIA(model=chat_model, max_tokens=60).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, max_tokens=60, **mode)
 
     result = llm.batch(["I'm Pickle Rick", "I'm not Pickle Rick"])
     for token in result:
@@ -188,7 +199,7 @@ def test_ai_endpoints_batch(chat_model: str, mode: dict) -> None:
 
 async def test_ai_endpoints_ainvoke(chat_model: str, mode: dict) -> None:
     """Test invoke tokens."""
-    llm = ChatNVIDIA(model=chat_model, max_tokens=60).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, max_tokens=60, **mode)
 
     result = await llm.ainvoke("I'm Pickle Rick", config={"tags": ["foo"]})
     assert isinstance(result.content, str)
@@ -196,7 +207,7 @@ async def test_ai_endpoints_ainvoke(chat_model: str, mode: dict) -> None:
 
 def test_ai_endpoints_invoke(chat_model: str, mode: dict) -> None:
     """Test invoke tokens."""
-    llm = ChatNVIDIA(model=chat_model, max_tokens=60).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, max_tokens=60, **mode)
 
     result = llm.invoke("I'm Pickle Rick", config=dict(tags=["foo"]))
     assert isinstance(result.content, str)
@@ -218,7 +229,7 @@ def test_ai_endpoints_invoke_max_tokens_negative(
 ) -> None:
     """Test invoke's max_tokens' bounds."""
     with pytest.raises(Exception):
-        llm = ChatNVIDIA(model=chat_model, max_tokens=max_tokens).mode(**mode)
+        llm = ChatNVIDIA(model=chat_model, max_tokens=max_tokens, **mode)
         llm.invoke("Show me the tokens")
         assert llm.client.last_response.status_code == 422
 
@@ -227,7 +238,7 @@ def test_ai_endpoints_invoke_max_tokens_positive(
     chat_model: str, mode: dict, max_tokens: int = 21
 ) -> None:
     """Test invoke's max_tokens."""
-    llm = ChatNVIDIA(model=chat_model, max_tokens=max_tokens).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, max_tokens=max_tokens, **mode)
     result = llm.invoke("Show me the tokens")
     assert isinstance(result.content, str)
     assert "token_usage" in result.response_metadata
@@ -241,10 +252,10 @@ def test_ai_endpoints_invoke_max_tokens_positive(
 @pytest.mark.xfail(reason="seed does not consistently control determinism")
 def test_ai_endpoints_invoke_seed_default(chat_model: str, mode: dict) -> None:
     """Test invoke's seed (default)."""
-    llm0 = ChatNVIDIA(model=chat_model).mode(**mode)  # default seed should not repeat
+    llm0 = ChatNVIDIA(model=chat_model, **mode)  # default seed should not repeat
     result0 = llm0.invoke("What's in a seed?")
     assert isinstance(result0.content, str)
-    llm1 = ChatNVIDIA(model=chat_model).mode(**mode)  # default seed should not repeat
+    llm1 = ChatNVIDIA(model=chat_model, **mode)  # default seed should not repeat
     result1 = llm1.invoke("What's in a seed?")
     assert isinstance(result1.content, str)
     # if this fails, consider setting a high temperature to avoid deterministic results
@@ -263,7 +274,7 @@ def test_ai_endpoints_invoke_seed_default(chat_model: str, mode: dict) -> None:
     ],
 )
 def test_ai_endpoints_invoke_seed_range(chat_model: str, mode: dict, seed: int) -> None:
-    llm = ChatNVIDIA(model=chat_model, seed=seed).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, seed=seed, **mode)
     llm.invoke("What's in a seed?")
     assert llm.client.last_response.status_code == 200
 
@@ -272,7 +283,7 @@ def test_ai_endpoints_invoke_seed_range(chat_model: str, mode: dict, seed: int)
 def test_ai_endpoints_invoke_seed_functional(
     chat_model: str, mode: dict, seed: int = 413
 ) -> None:
-    llm = ChatNVIDIA(model=chat_model, seed=seed).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, seed=seed, **mode)
     result0 = llm.invoke("What's in a seed?")
     assert isinstance(result0.content, str)
     result1 = llm.invoke("What's in a seed?")
@@ -289,7 +300,7 @@ def test_ai_endpoints_invoke_temperature_negative(
 ) -> None:
     """Test invoke's temperature (negative)."""
     with pytest.raises(Exception):
-        llm = ChatNVIDIA(model=chat_model, temperature=temperature).mode(**mode)
+        llm = ChatNVIDIA(model=chat_model, temperature=temperature, **mode)
         llm.invoke("What's in a temperature?")
         assert llm.client.last_response.status_code == 422
 
@@ -298,10 +309,10 @@ def test_ai_endpoints_invoke_temperature_negative(
 def test_ai_endpoints_invoke_temperature_positive(chat_model: str, mode: dict) -> None:
     """Test invoke's temperature (positive)."""
     # idea is to have a fixed seed and vary temperature to get different results
-    llm0 = ChatNVIDIA(model=chat_model, seed=608, templerature=0).mode(**mode)
+    llm0 = ChatNVIDIA(model=chat_model, seed=608, templerature=0, **mode)
     result0 = llm0.invoke("What's in a temperature?")
     assert isinstance(result0.content, str)
-    llm1 = ChatNVIDIA(model=chat_model, seed=608, templerature=1).mode(**mode)
+    llm1 = ChatNVIDIA(model=chat_model, seed=608, templerature=1, **mode)
     result1 = llm1.invoke("What's in a temperature?")
     assert isinstance(result1.content, str)
     assert result0.content != result1.content
@@ -316,7 +327,7 @@ def test_ai_endpoints_invoke_top_p_negative(
 ) -> None:
     """Test invoke's top_p (negative)."""
     with pytest.raises(Exception):
-        llm = ChatNVIDIA(model=chat_model, top_p=top_p).mode(**mode)
+        llm = ChatNVIDIA(model=chat_model, top_p=top_p, **mode)
         llm.invoke("What's in a top_p?")
         assert llm.client.last_response.status_code == 422
 
@@ -325,10 +336,10 @@ def test_ai_endpoints_invoke_top_p_negative(
 def test_ai_endpoints_invoke_top_p_positive(chat_model: str, mode: dict) -> None:
     """Test invoke's top_p (positive)."""
     # idea is to have a fixed seed and vary top_p to get different results
-    llm0 = ChatNVIDIA(model=chat_model, seed=608, top_p=0.1).mode(**mode)
+    llm0 = ChatNVIDIA(model=chat_model, seed=608, top_p=0.1, **mode)
     result0 = llm0.invoke("What's in a top_p?")
     assert isinstance(result0.content, str)
-    llm1 = ChatNVIDIA(model=chat_model, seed=608, top_p=1).mode(**mode)
+    llm1 = ChatNVIDIA(model=chat_model, seed=608, top_p=1, **mode)
     result1 = llm1.invoke("What's in a top_p?")
     assert isinstance(result1.content, str)
     assert result0.content != result1.content
@@ -336,23 +347,24 @@ def test_ai_endpoints_invoke_top_p_positive(chat_model: str, mode: dict) -> None
 
 @pytest.mark.skip("serialization support is broken, needs attention")
 def test_serialize_chatnvidia(chat_model: str, mode: dict) -> None:
-    llm = ChatNVIDIA(model=chat_model).mode(**mode)
+    llm = ChatNVIDIA(model=chat_model, **mode)
     model = loads(dumps(llm), valid_namespaces=["langchain_nvidia_ai_endpoints"])
     result = model.invoke("What is there if there is nothing?")
     assert isinstance(result.content, str)
 
 
 def test_chat_available_models(mode: dict) -> None:
-    llm = ChatNVIDIA().mode(**mode)
-    models = llm.available_models
+    models = ChatNVIDIA(**mode).available_models
     assert len(models) >= 1
+    for model in models:
+        assert isinstance(model, Model)
     # we don't have type information for local nim endpoints
-    if mode.get("mode", None) != "nim":
+    if "mode" in mode and mode["mode"] == "nvidia":
         assert all(model.model_type is not None for model in models)
 
 
 def test_chat_get_available_models(mode: dict) -> None:
     models = ChatNVIDIA.get_available_models(**mode)
-    assert len(models) > 0
+    assert len(models) >= 1
     for model in models:
         assert isinstance(model, Model)
diff --git a/libs/ai-endpoints/tests/integration_tests/test_embeddings.py b/libs/ai-endpoints/tests/integration_tests/test_embeddings.py
index f9af79b0..8f007002 100644
--- a/libs/ai-endpoints/tests/integration_tests/test_embeddings.py
+++ b/libs/ai-endpoints/tests/integration_tests/test_embeddings.py
@@ -12,44 +12,44 @@
 def test_embed_query(embedding_model: str, mode: dict) -> None:
     """Test NVIDIA embeddings for a single query."""
     query = "foo bar"
-    embedding = NVIDIAEmbeddings(model=embedding_model).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, **mode)
     output = embedding.embed_query(query)
-    assert len(output) == 1024
+    assert len(output) > 3
 
 
 async def test_embed_query_async(embedding_model: str, mode: dict) -> None:
     """Test NVIDIA async embeddings for a single query."""
     query = "foo bar"
-    embedding = NVIDIAEmbeddings(model=embedding_model).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, **mode)
     output = await embedding.aembed_query(query)
-    assert len(output) == 1024
+    assert len(output) > 3
 
 
 def test_embed_documents_single(embedding_model: str, mode: dict) -> None:
     """Test NVIDIA embeddings for documents."""
     documents = ["foo bar"]
-    embedding = NVIDIAEmbeddings(model=embedding_model).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, **mode)
     output = embedding.embed_documents(documents)
     assert len(output) == 1
-    assert len(output[0]) == 1024  # Assuming embedding size is 2048
+    assert len(output[0]) > 3
 
 
 def test_embed_documents_multiple(embedding_model: str, mode: dict) -> None:
     """Test NVIDIA embeddings for multiple documents."""
     documents = ["foo bar", "bar foo", "foo"]
-    embedding = NVIDIAEmbeddings(model=embedding_model).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, **mode)
     output = embedding.embed_documents(documents)
     assert len(output) == 3
-    assert all(len(doc) == 1024 for doc in output)
+    assert all(len(doc) > 4 for doc in output)
 
 
 async def test_embed_documents_multiple_async(embedding_model: str, mode: dict) -> None:
     """Test NVIDIA async embeddings for multiple documents."""
     documents = ["foo bar", "bar foo", "foo"]
-    embedding = NVIDIAEmbeddings(model=embedding_model).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, **mode)
     output = await embedding.aembed_documents(documents)
     assert len(output) == 3
-    assert all(len(doc) == 1024 for doc in output)
+    assert all(len(doc) > 4 for doc in output)
 
 
 def test_embed_available_models(mode: dict) -> None:
@@ -57,8 +57,7 @@ def test_embed_available_models(mode: dict) -> None:
         pytest.skip(f"available_models test only valid against API Catalog, not {mode}")
     embedding = NVIDIAEmbeddings()
     models = embedding.available_models
-    assert len(models) >= 2  # nvolveqa_40k and ai-embed-qa-4
-    assert "nvolveqa_40k" in [model.id for model in models]
+    assert len(models) >= 1
     assert "ai-embed-qa-4" in [model.id for model in models]
     assert all(model.model_type is not None for model in models)
 
@@ -77,25 +76,23 @@ def test_embed_available_models_cached() -> None:
 
 
 def test_embed_query_long_text(embedding_model: str, mode: dict) -> None:
-    embedding = NVIDIAEmbeddings(model=embedding_model).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, **mode)
     text = "nvidia " * 2048
     with pytest.raises(Exception):
         embedding.embed_query(text)
 
 
 def test_embed_documents_batched_texts(embedding_model: str, mode: dict) -> None:
-    embedding = NVIDIAEmbeddings(model=embedding_model).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, **mode)
     count = NVIDIAEmbeddings._default_max_batch_size * 2 + 1
     texts = ["nvidia " * 32] * count
     output = embedding.embed_documents(texts)
     assert len(output) == count
-    assert all(len(embedding) == 1024 for embedding in output)
+    assert all(len(embedding) > 3 for embedding in output)
 
 
 def test_embed_documents_mixed_long_texts(embedding_model: str, mode: dict) -> None:
-    if embedding_model == "nvolveqa_40k":
-        pytest.xfail("AI Foundation Model trucates by default")
-    embedding = NVIDIAEmbeddings(model=embedding_model).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, **mode)
     count = NVIDIAEmbeddings._default_max_batch_size * 2 - 1
     texts = ["nvidia " * 32] * count
     texts[len(texts) // 2] = "nvidia " * 2048
@@ -105,19 +102,17 @@ def test_embed_documents_mixed_long_texts(embedding_model: str, mode: dict) -> N
 
 @pytest.mark.parametrize("truncate", ["START", "END"])
 def test_embed_query_truncate(embedding_model: str, mode: dict, truncate: str) -> None:
-    if embedding_model == "nvolveqa_40k":
-        pytest.xfail("AI Foundation Model does not support truncate option")
-    embedding = NVIDIAEmbeddings(model=embedding_model, truncate=truncate).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, truncate=truncate, **mode)
     text = "nvidia " * 2048
     output = embedding.embed_query(text)
-    assert len(output) == 1024
+    assert len(output) > 3
 
 
 @pytest.mark.parametrize("truncate", ["START", "END"])
 def test_embed_documents_truncate(
     embedding_model: str, mode: dict, truncate: str
 ) -> None:
-    embedding = NVIDIAEmbeddings(model=embedding_model, truncate=truncate).mode(**mode)
+    embedding = NVIDIAEmbeddings(model=embedding_model, truncate=truncate, **mode)
     count = 10
     texts = ["nvidia " * 32] * count
     texts[len(texts) // 2] = "nvidia " * 2048
diff --git a/libs/ai-endpoints/tests/integration_tests/test_other_models.py b/libs/ai-endpoints/tests/integration_tests/test_other_models.py
index aeb9f8cb..b0199fd3 100644
--- a/libs/ai-endpoints/tests/integration_tests/test_other_models.py
+++ b/libs/ai-endpoints/tests/integration_tests/test_other_models.py
@@ -10,7 +10,7 @@
 
 def test_chat_ai_endpoints_context_message(qa_model: str, mode: dict) -> None:
     """Test wrapper with context message."""
-    chat = ChatNVIDIA(model=qa_model, max_tokens=36).mode(**mode)
+    chat = ChatNVIDIA(model=qa_model, max_tokens=36, **mode)
     context_message = BaseMessage(
         content="Once upon a time there was a little langchainer", type="context"
     )
@@ -22,7 +22,7 @@ def test_chat_ai_endpoints_context_message(qa_model: str, mode: dict) -> None:
 
 def test_image_in_models(image_in_model: str, mode: dict) -> None:
     try:
-        chat = ChatNVIDIA(model=image_in_model).mode(**mode)
+        chat = ChatNVIDIA(model=image_in_model, **mode)
         response = chat.invoke(
             [
                 HumanMessage(
diff --git a/libs/ai-endpoints/tests/integration_tests/test_ranking.py b/libs/ai-endpoints/tests/integration_tests/test_ranking.py
index 3c5c3a6c..8d79ad33 100644
--- a/libs/ai-endpoints/tests/integration_tests/test_ranking.py
+++ b/libs/ai-endpoints/tests/integration_tests/test_ranking.py
@@ -3,6 +3,7 @@
 
 import faker
 import pytest
+from langchain_core._api import LangChainDeprecationWarning
 from langchain_core.documents import Document
 from requests.exceptions import ConnectionError, MissingSchema
 
@@ -61,7 +62,7 @@ def test_langchain_reranker_get_available_models_all(mode: dict) -> None:
 
 
 def test_langchain_reranker_available_models(mode: dict) -> None:
-    ranker = NVIDIARerank().mode(**mode)
+    ranker = NVIDIARerank(**mode)
     models = ranker.available_models
     assert len(models) > 0
     for model in models:
@@ -72,7 +73,20 @@ def test_langchain_reranker_available_models(mode: dict) -> None:
 def test_langchain_reranker_direct(
     query: str, documents: List[Document], rerank_model: str, mode: dict
 ) -> None:
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
+    result_docs = ranker.compress_documents(documents=documents, query=query)
+    assert len(result_docs) > 0
+    for doc in result_docs:
+        assert "relevance_score" in doc.metadata
+        assert doc.metadata["relevance_score"] is not None
+        assert isinstance(doc.metadata["relevance_score"], float)
+
+
+def test_langchain_reranker_direct_deprecated(
+    query: str, documents: List[Document], rerank_model: str, mode: dict
+) -> None:
+    with pytest.warns(LangChainDeprecationWarning):
+        ranker = NVIDIARerank(model=rerank_model).mode(**mode)
     result_docs = ranker.compress_documents(documents=documents, query=query)
     assert len(result_docs) > 0
     for doc in result_docs:
@@ -84,7 +98,7 @@ def test_langchain_reranker_direct(
 def test_langchain_reranker_direct_empty_docs(
     query: str, rerank_model: str, mode: dict
 ) -> None:
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
     result_docs = ranker.compress_documents(documents=[], query=query)
     assert len(result_docs) == 0
 
@@ -94,7 +108,7 @@ def test_langchain_reranker_direct_top_n_negative(
 ) -> None:
     orig = NVIDIARerank.Config.validate_assignment
     NVIDIARerank.Config.validate_assignment = False
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
     ranker.top_n = -100
     NVIDIARerank.Config.validate_assignment = orig
     result_docs = ranker.compress_documents(documents=documents, query=query)
@@ -104,7 +118,7 @@ def test_langchain_reranker_direct_top_n_negative(
 def test_langchain_reranker_direct_top_n_zero(
     query: str, documents: List[Document], rerank_model: str, mode: dict
 ) -> None:
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
     ranker.top_n = 0
     result_docs = ranker.compress_documents(documents=documents, query=query)
     assert len(result_docs) == 0
@@ -113,7 +127,7 @@ def test_langchain_reranker_direct_top_n_zero(
 def test_langchain_reranker_direct_top_n_one(
     query: str, documents: List[Document], rerank_model: str, mode: dict
 ) -> None:
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
     ranker.top_n = 1
     result_docs = ranker.compress_documents(documents=documents, query=query)
     assert len(result_docs) == 1
@@ -122,7 +136,7 @@ def test_langchain_reranker_direct_top_n_one(
 def test_langchain_reranker_direct_top_n_equal_len_docs(
     query: str, documents: List[Document], rerank_model: str, mode: dict
 ) -> None:
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
     ranker.top_n = len(documents)
     result_docs = ranker.compress_documents(documents=documents, query=query)
     assert len(result_docs) == len(documents)
@@ -131,7 +145,7 @@ def test_langchain_reranker_direct_top_n_equal_len_docs(
 def test_langchain_reranker_direct_top_n_greater_len_docs(
     query: str, documents: List[Document], rerank_model: str, mode: dict
 ) -> None:
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
     ranker.top_n = len(documents) * 2
     result_docs = ranker.compress_documents(documents=documents, query=query)
     assert len(result_docs) == len(documents)
@@ -141,13 +155,13 @@ def test_langchain_reranker_direct_top_n_greater_len_docs(
 def test_rerank_invalid_max_batch_size(
     rerank_model: str, mode: dict, batch_size: int
 ) -> None:
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
     with pytest.raises(ValueError):
         ranker.max_batch_size = batch_size
 
 
 def test_rerank_invalid_top_n(rerank_model: str, mode: dict) -> None:
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
     with pytest.raises(ValueError):
         ranker.top_n = -10
 
@@ -173,7 +187,7 @@ def test_rerank_batching(
 ) -> None:
     assert len(documents) > batch_size, "test requires more documents"
 
-    ranker = NVIDIARerank(model=rerank_model).mode(**mode)
+    ranker = NVIDIARerank(model=rerank_model, **mode)
     ranker.top_n = top_n
     ranker.max_batch_size = batch_size
     result_docs = ranker.compress_documents(documents=documents, query=query)
@@ -210,13 +224,29 @@ def test_rerank_batching(
 def test_langchain_reranker_direct_endpoint_bogus(
     query: str, documents: List[Document]
 ) -> None:
-    ranker = NVIDIARerank().mode(mode="nim", base_url="bogus")
+    ranker = NVIDIARerank(base_url="bogus")
     with pytest.raises(MissingSchema):
         ranker.compress_documents(documents=documents, query=query)
 
 
 def test_langchain_reranker_direct_endpoint_unavailable(
     query: str, documents: List[Document]
+) -> None:
+    ranker = NVIDIARerank(base_url="http://localhost:12321")
+    with pytest.raises(ConnectionError):
+        ranker.compress_documents(documents=documents, query=query)
+
+
+def test_langchain_reranker_direct_endpoint_bogus_deprecated(
+    query: str, documents: List[Document]
+) -> None:
+    ranker = NVIDIARerank().mode(mode="nim", base_url="bogus")
+    with pytest.raises(MissingSchema):
+        ranker.compress_documents(documents=documents, query=query)
+
+
+def test_langchain_reranker_direct_endpoint_unavailable_deprecated(
+    query: str, documents: List[Document]
 ) -> None:
     ranker = NVIDIARerank().mode(mode="nim", base_url="http://localhost:12321")
     with pytest.raises(ConnectionError):
diff --git a/libs/ai-endpoints/tests/unit_tests/test_chat_models.py b/libs/ai-endpoints/tests/unit_tests/test_chat_models.py
index c1a735bb..b018652b 100644
--- a/libs/ai-endpoints/tests/unit_tests/test_chat_models.py
+++ b/libs/ai-endpoints/tests/unit_tests/test_chat_models.py
@@ -55,3 +55,17 @@ def test_param_labels_deprecated() -> None:
         ChatNVIDIA()
     with pytest.deprecated_call():
         ChatNVIDIA(labels={"label": 1.0})
+
+
+@pytest.mark.parametrize(
+    "base_url",
+    [
+        "bogus",
+        "http:/",
+        "http://",
+        "http:/oops",
+    ],
+)
+def test_param_base_url_negative(base_url: str) -> None:
+    with pytest.raises(ValueError):
+        ChatNVIDIA(base_url=base_url)
diff --git a/libs/ai-endpoints/tests/unit_tests/test_embeddings.py b/libs/ai-endpoints/tests/unit_tests/test_embeddings.py
index bd7db762..78b0d310 100644
--- a/libs/ai-endpoints/tests/unit_tests/test_embeddings.py
+++ b/libs/ai-endpoints/tests/unit_tests/test_embeddings.py
@@ -83,16 +83,6 @@ def test_embed_documents_negative_input_list_mixed(embedding: NVIDIAEmbeddings)
         embedding.embed_documents(documents)  # type: ignore
 
 
-def test_embed_deprecated_nvolvqa_40k() -> None:
-    with warnings.catch_warnings():
-        warnings.simplefilter("error")
-        NVIDIAEmbeddings()
-    with pytest.deprecated_call():
-        NVIDIAEmbeddings(model="nvolveqa_40k")
-    with pytest.deprecated_call():
-        NVIDIAEmbeddings(model="playground_nvolveqa_40k")
-
-
 def test_embed_max_length_deprecated() -> None:
     with warnings.catch_warnings():
         warnings.simplefilter("error")

From d9af72fa944d58dec80adc0a91a13a41429f50a7 Mon Sep 17 00:00:00 2001
From: Hayden Wolff <hwolff@nvidia.com>
Date: Tue, 28 May 2024 16:16:43 -0700
Subject: [PATCH 12/13] Amanda language fixes to NIM

---
 .../docs/chat/nvidia_ai_endpoints.ipynb        | 17 ++++++++++++-----
 libs/ai-endpoints/docs/providers/nvidia.mdx    | 18 ++++++++++++------
 .../docs/retrievers/nvidia_rerank.ipynb        | 17 ++++++++++++-----
 .../text_embedding/nvidia_ai_endpoints.ipynb   | 17 ++++++++++++-----
 4 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index b41c4e75..10a48667 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -9,11 +9,18 @@
    "source": [
     "# NVIDIA NIMs\n",
     "\n",
-    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on NVIDIA NIM Inference Microservice. NIMs support models like chat, embedding, and re-ranking models from the community, partners, and NVIDIA. These models are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure and deployed as NIMs: easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure. \n",
-    "\n",
-    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, NIMs, included with the NVIDIA AI Enterprise license, can be exported from NVIDIA’s API catalog and run on-premises or in the cloud, giving Enterprises ownership and full control of their IP and AI application. \n",
-    "\n",
-    "NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide easy, consistent, and familiar APIs for running inference on an AI Model. \n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on \n",
+    "NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models \n",
+    "from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA \n",
+    "accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single \n",
+    "command on NVIDIA accelerated infrastructure.\n",
+    "\n",
+    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, \n",
+    "NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, \n",
+    "giving enterprises ownership and full control of their IP and AI application.\n",
+    "\n",
+    "NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. \n",
+    "At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model.\n",
     "\n",
     "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
     "\n",
diff --git a/libs/ai-endpoints/docs/providers/nvidia.mdx b/libs/ai-endpoints/docs/providers/nvidia.mdx
index 4d950a0e..ec967e37 100644
--- a/libs/ai-endpoints/docs/providers/nvidia.mdx
+++ b/libs/ai-endpoints/docs/providers/nvidia.mdx
@@ -1,10 +1,16 @@
 # NVIDIA
-
-The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on NVIDIA NIM Inference Microservice. NIMs support models like chat, embedding, and re-ranking models from the community,  partners, and NVIDIA. These models are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure and deployed as NIMs: easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure. 
-
-NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, NIMs, included with the NVIDIA AI Enterprise license, can be exported from NVIDIA’s API catalog and run on-premises or in the cloud, giving Enterprises ownership and full control of their IP and AI application. 
-
-NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide easy, consistent, and familiar APIs for running inference on an AI Model. 
+The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on 
+NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models 
+from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA 
+accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single 
+command on NVIDIA accelerated infrastructure.
+
+NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, 
+NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, 
+giving enterprises ownership and full control of their IP and AI application.
+
+NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. 
+At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model.
 
 Below is an example on how to use some common functionality surrounding text-generative and embedding models.
 
diff --git a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
index 62706e9d..ece7527c 100644
--- a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
+++ b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
@@ -6,11 +6,18 @@
    "source": [
     "# NVIDIA NIMs\n",
     "\n",
-    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on NVIDIA NIM Inference Microservice. NIMs support models like chat, embedding, and re-ranking models from the community, partners, and NVIDIA. These models are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure and deployed as NIMs: easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure. \n",
-    "\n",
-    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, NIMs, included with the NVIDIA AI Enterprise license, can be exported from NVIDIA’s API catalog and run on-premises or in the cloud, giving Enterprises ownership and full control of their IP and AI application. \n",
-    "\n",
-    "NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide easy, consistent, and familiar APIs for running inference on an AI Model. \n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on \n",
+    "NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models \n",
+    "from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA \n",
+    "accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single \n",
+    "command on NVIDIA accelerated infrastructure.\n",
+    "\n",
+    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, \n",
+    "NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, \n",
+    "giving enterprises ownership and full control of their IP and AI application.\n",
+    "\n",
+    "NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. \n",
+    "At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model.\n",
     "\n",
     "This example goes over how to use LangChain to interact with a NIM for a re-ranking model as well as a NIM for embeddings via LangChain's `NVIDIARerank` and `NVIDIAEmbeddings` classes. The example demonstrates how a re-ranking model can be used to combine retrieval results and improve accuracy during retrieval of documents.\n",
     "\n",
diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index ee947e86..cb878bc9 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -8,11 +8,18 @@
    "source": [
     "# NVIDIA NIMs \n",
     "\n",
-    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on NVIDIA NIM Inference Microservice. NIMs support models like chat, embedding, and re-ranking models from the community,  partners, and NVIDIA. These models are NVIDIA-optimized to deliver the best performance on NVIDIA accelerated infrastructure and deployed as NIMs: easy-to-use, prebuilt containers that deploy anywhere using a single command on NVIDIA accelerated infrastructure. \n",
-    "\n",
-    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, NIMs, included with the NVIDIA AI Enterprise license, can be exported from NVIDIA’s API catalog and run on-premises or in the cloud, giving Enterprises ownership and full control of their IP and AI application. \n",
-    "\n",
-    "NIMs are packaged as container images on a per model/model family basis and are distributed as NGC container images through the NVIDIA NGC Catalog. At their core, NIMs are containers that provide easy, consistent, and familiar APIs for running inference on an AI Model. \n",
+    "The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on \n",
+    "NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models \n",
+    "from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA \n",
+    "accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single \n",
+    "command on NVIDIA accelerated infrastructure.\n",
+    "\n",
+    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, \n",
+    "NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, \n",
+    "giving enterprises ownership and full control of their IP and AI application.\n",
+    "\n",
+    "NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. \n",
+    "At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model.\n",
     "\n",
     "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
     "\n",

From 540d24cadb4634ab716fb766bec823647cb996e2 Mon Sep 17 00:00:00 2001
From: Daniel Glogowski <dglogowski@nvidia.com>
Date: Tue, 28 May 2024 16:41:06 -0700
Subject: [PATCH 13/13] nits and minor changes

---
 libs/ai-endpoints/README.md                          | 12 ++++++------
 .../ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb |  6 +++---
 libs/ai-endpoints/docs/providers/nvidia.mdx          | 12 ++++++------
 .../ai-endpoints/docs/retrievers/nvidia_rerank.ipynb |  8 ++++----
 .../docs/text_embedding/nvidia_ai_endpoints.ipynb    |  6 +++---
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/libs/ai-endpoints/README.md b/libs/ai-endpoints/README.md
index 19337d4e..d798184c 100644
--- a/libs/ai-endpoints/README.md
+++ b/libs/ai-endpoints/README.md
@@ -50,14 +50,14 @@ When ready to deploy, you can self-host models with NVIDIA NIM—which is includ
 ```python
 from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank
 
-# connect to an chat NIM running at localhost:8000
-llm = ChatNVIDIA(base_url="http://localhost:8000/v1")
+# connect to an chat NIM running at localhost:8000, specifying a specific model
+llm = ChatNVIDIA(base_url="http://localhost:8000/v1", model="meta-llama3-8b-instruct")
 
-# connect to an embedding NIM running at localhost:2016
-embedder = NVIDIAEmbeddings(base_url="http://localhost:2016/v1")
+# connect to an embedding NIM running at localhost:8080
+embedder = NVIDIAEmbeddings(base_url="http://localhost:8080/v1")
 
-# connect to a reranking NIM running at localhost:1976
-ranker = NVIDIARerank(base_url="http://localhost:1976/v1")
+# connect to a reranking NIM running at localhost:2016
+ranker = NVIDIARerank(base_url="http://localhost:2016/v1")
 ```
 
 ## Stream, Batch, and Async
diff --git a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
index 10a48667..da6c5973 100644
--- a/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/chat/nvidia_ai_endpoints.ipynb
@@ -22,7 +22,7 @@
     "NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. \n",
     "At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model.\n",
     "\n",
-    "This example goes over how to use LangChain to interact with the supported [NVIDIA Retrieval QA Embedding Model](https://build.nvidia.com/nvidia/embed-qa-4) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.\n",
+    "This example goes over how to use LangChain to interact with NVIDIA supported via the `ChatNVIDIA` class.\n",
     "\n",
     "For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
    ]
@@ -133,8 +133,8 @@
    "source": [
     "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
     "\n",
-    "# connect to an embedding NIM running at localhost:8000\n",
-    "llm = ChatNVIDIA(base_url=\"http://localhost:8000/v1\")"
+    "# connect to an embedding NIM running at localhost:8000, specifying a specific model\n",
+    "llm = ChatNVIDIA(base_url=\"http://localhost:8000/v1\", model=\"meta-llama3-8b-instruct\")"
    ]
   },
   {
diff --git a/libs/ai-endpoints/docs/providers/nvidia.mdx b/libs/ai-endpoints/docs/providers/nvidia.mdx
index ec967e37..0e9fa2e9 100644
--- a/libs/ai-endpoints/docs/providers/nvidia.mdx
+++ b/libs/ai-endpoints/docs/providers/nvidia.mdx
@@ -61,14 +61,14 @@ When ready to deploy, you can self-host models with NVIDIA NIM—which is includ
 ```python
 from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank
 
-# connect to an chat NIM running at localhost:8000
-llm = ChatNVIDIA(base_url="http://localhost:8000/v1")
+# connect to an chat NIM running at localhost:8000, specifyig a specific model
+llm = ChatNVIDIA(base_url="http://localhost:8000/v1", model="meta-llama3-8b-instruct")
 
-# connect to an embedding NIM running at localhost:2016
-embedder = NVIDIAEmbeddings(base_url="http://localhost:2016/v1")
+# connect to an embedding NIM running at localhost:8080
+embedder = NVIDIAEmbeddings(base_url="http://localhost:8080/v1")
 
-# connect to a reranking NIM running at localhost:1976
-ranker = NVIDIARerank(base_url="http://localhost:1976/v1")
+# connect to a reranking NIM running at localhost:2016
+ranker = NVIDIARerank(base_url="http://localhost:2016/v1")
 ```
 
 ## Using NVIDIA AI Foundation Endpoints
diff --git a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
index ece7527c..0c36ec6a 100644
--- a/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
+++ b/libs/ai-endpoints/docs/retrievers/nvidia_rerank.ipynb
@@ -107,11 +107,11 @@
    "source": [
     "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, NVIDIARerank\n",
     "\n",
-    "# connect to an embedding NIM running at localhost:2016\n",
-    "embedder = NVIDIAEmbeddings(base_url=\"http://localhost:2016/v1\")\n",
+    "# connect to an embedding NIM running at localhost:8080\n",
+    "embedder = NVIDIAEmbeddings(base_url=\"http://localhost:8080/v1\")\n",
     "\n",
-    "# connect to a reranking NIM running at localhost:1976\n",
-    "ranker = NVIDIARerank(base_url=\"http://localhost:1976/v1\")"
+    "# connect to a reranking NIM running at localhost:2016\n",
+    "reranker = NVIDIARerank(base_url=\"http://localhost:2016/v1\")"
    ]
   },
   {
diff --git a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
index cb878bc9..c6a133a8 100644
--- a/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
+++ b/libs/ai-endpoints/docs/text_embedding/nvidia_ai_endpoints.ipynb
@@ -149,8 +149,8 @@
    "source": [
     "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
     "\n",
-    "# connect to an embedding NIM running at localhost:2016\n",
-    "embedder = NVIDIAEmbeddings(base_url=\"http://localhost:2016/v1\")"
+    "# connect to an embedding NIM running at localhost:8080\n",
+    "embedder = NVIDIAEmbeddings(base_url=\"http://localhost:8080/v1\")"
    ]
   },
   {
@@ -437,7 +437,7 @@
    "source": [
     "vectorstore = FAISS.from_texts(\n",
     "    [\"harrison worked at kensho\"],\n",
-    "    embedding=NVIDIAEmbeddings(model=\"ai-embed-qa-4\"),\n",
+    "    embedding=NVIDIAEmbeddings(model=\"NV-Embed-QA\"),\n",
     ")\n",
     "retriever = vectorstore.as_retriever()\n",
     "\n",