Skip to content

Commit

Permalink
feat: add support for version specification in Azure Group-level Conf…
Browse files Browse the repository at this point in the history
…ig (#170)

* feat: add support for version specification in Azure Group-level Configuration, update docs

* Update azure.mdx typo

* Update azure.mdx
  • Loading branch information
danny-avila authored Nov 25, 2024
1 parent eec5608 commit d968123
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 25 deletions.
1 change: 1 addition & 0 deletions components/changelog/content/config_v1.1.8.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- Added support for specifying `version` in [Azure Group-level Configuration](/docs/configuration/azure#group-level-configuration) when using [Serverless Inference Endpoints](/docs/configuration/azure#serverless-inference-endpoints)
13 changes: 13 additions & 0 deletions pages/changelog/config_v1.1.8.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
date: 2024/11/25
title: ⚙️ Config v1.1.8
---

import { ChangelogHeader } from '@/components/changelog/ChangelogHeader'
import Content from '@/components/changelog/content/config_v1.1.8.mdx'

<ChangelogHeader />

---

<Content />
43 changes: 18 additions & 25 deletions pages/docs/configuration/azure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -531,49 +531,42 @@ Remember to replace placeholder text with actual prompts or instructions and pro

### Serverless Inference Endpoints

Through the `librechat.yaml` file, you can configure Azure AI Studio serverless inference endpoints to access models from the [Azure Model Catalog.](https://ai.azure.com/explore) Only a model identifier, `baseURL`, and `apiKey` are needed along with the `serverless` field to indicate the special handling these endpoints need.
Through the `librechat.yaml` file, you can configure Azure AI Studio serverless inference endpoints to access models from the [Azure AI Foundry.](https://ai.azure.com/explore) Only a model identifier, `baseURL`, and `apiKey` are needed along with the `serverless` field to indicate the special handling these endpoints need.

- You will need to follow the instructions in the compatible model cards to set up **MaaS** ("Models as a Service") access on Azure AI Studio.

- For reference, here are some known compatible model cards:

- [Mistral-large](https://aka.ms/aistudio/landing/mistral-large) | [Llama-2-70b-chat](https://aka.ms/aistudio/landing/Llama-2-70b-chat) | [Phi-3-medium-128k-instruct](https://ai.azure.com/explore/models/Phi-3-medium-128k-instruct/version/1/registry/azureml)
- [Mistral-large](https://aka.ms/aistudio/landing/mistral-large) | [Meta-Llama-3.1-8B-Instruct](https://ai.azure.com/explore/models/Meta-Llama-3.1-8B-Instruct/version/4/) | [Phi-3-medium-128k-instruct](https://ai.azure.com/explore/models/Phi-3-medium-128k-instruct/version/1/registry/azureml)

- You can also review [the technical blog for the "Mistral-large" model release](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/mistral-large-mistral-ai-s-flagship-llm-debuts-on-azure-ai/ba-p/4066996) for more info.

- Then, you will need to add them to your azureOpenAI config in the librechat.yaml file.
- Then, you will need to add them to your `azureOpenAI` config in the librechat.yaml file.

- Here are example configurations for Mistral-large, LLama-2-70b-chat, and Phi-3-medium-128k-instruct:
- Here is an example configuration for `Meta-Llama-3.1-8B-Instruct`:

```yaml filename="librechat.yaml"
endpoints:
azureOpenAI:
groups:
# serverless examples
- group: "mistral-inference"
apiKey: "${AZURE_MISTRAL_API_KEY}" # arbitrary env var name
baseURL: "https://Mistral-large-vnpet-serverless.region.inference.ai.azure.com/v1/chat/completions"
serverless: true
models:
mistral-large: true
- group: "llama-70b-chat"
apiKey: "${AZURE_LLAMA2_70B_API_KEY}" # arbitrary env var name
baseURL: "https://Llama-2-70b-chat-qmvyb-serverless.region.inference.ai.azure.com/v1/chat/completions"
serverless: true
models:
llama-70b-chat: true
- group: "phi-3-medium-128k-instruct"
apiKey: "${AZURE_PHI3_MEDIUM_API_KEY}" # arbitrary env var name
baseURL: "https://Phi-3-medium-128k-instruct-abcde-serverless.eastus2.inference.ai.azure.com/v1/chat/completions"
- group: "serverless-example"
apiKey: "${LLAMA318B_API_KEY}" # arbitrary env var name
baseURL: "https://example.services.ai.azure.com/models/"
version: "2024-05-01-preview" # Optional: specify API version
serverless: true
models:
phi-3-medium-128k-instruct: true
# Must match the deployment name of the model
Meta-Llama-3.1-8B-Instruct: true
```

**Notes**:

- Make sure to add the appropriate suffix for your deployment, either "/v1/chat/completions" or "/v1/completions"
- If using "/v1/completions" (without "chat"), you need to set the `forcePrompt` field to `true` in your [group config.](#group-level-configuration)
- Compatibility with LibreChat relies on parity with OpenAI API specs, which at the time of writing, are typically **"Pay-as-you-go"** or "Models as a Service" (MaaS) deployments on Azure AI Studio, that are OpenAI-SDK-compatible with either v1/completions or v1/chat/completions endpoint handling.
- Azure AI Foundry models now provision endpoints under `/models/chat/completions?api-version=version` for serverless inference.
- The `baseURL` field should be set to the root of the endpoint, without anything after `/models/`, i.e., the `/chat/completions` path.
- Example: `https://example.services.ai.azure.com/models/` for `https://example.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview`
- The `version` query parameter is optional and can be specified in the `baseURL` field.
- The model name used in the `models` field must match the deployment name of the model in the Azure AI Foundry.
- Compatibility with LibreChat relies on parity with OpenAI API specs, which at the time of writing, are typically **"Pay-as-you-go"** or "Models as a Service" (MaaS) deployments on Azure AI Studio, that are OpenAI-SDK-compatible with either `v1/completions` or `models/chat/completions` endpoint handling.
- All models that offer serverless deployments ("Serverless APIs") are compatible from the Azure model catalog. You can filter by "Serverless API" under Deployment options and "Chat completion" under inference tasks to see the full list; however, real time endpoint models have not been tested.
- These serverless inference endpoint/models are likely not compatible with OpenAI function calling, which enables the use of Plugins. As they have yet been tested, they are available on the Plugins endpoint, although they are not expected to work.
- These serverless inference endpoint/models may or may not support function calling according to OpenAI API specs, which enables their use with Agents.
- If using legacy "/v1/completions" (without "chat"), you need to set the `forcePrompt` field to `true` in your [group config.](#group-level-configuration)

0 comments on commit d968123

Please sign in to comment.