v0.0.30 (#11)

pkelaita · Aug 6, 2024 · 64b32fd · 64b32fd
2 parents 7078afd + ee6adc2
commit 64b32fd
Show file tree

Hide file tree

Showing 7 changed files with 217 additions and 132 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,9 +1,17 @@
 # Changelog
 
-_Current version: 0.0.29_
+_Current version: 0.0.30_
 
 [PyPi link](https://pypi.org/project/l2m2/)
 
+### 0.0.30 - August 5, 2024
+
+#### Added
+
+- [Mistral](https://mistral.ai/) provider support via La Plateforme.
+- [Mistral Large 2](https://mistral.ai/news/mistral-large-2407/) model availibility from Mistral.
+- Mistral 7B, Mixtral 8x7B, and Mixtral 8x22B model availibility from Mistral in addition to existing providers.
+
 ### 0.0.29 - August 4, 2024
 
 > [!CAUTION]

diff --git a/README.md b/README.md
@@ -1,12 +1,12 @@
 # L2M2: A Simple Python LLM Manager 💬👍
 
-[![Tests](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml/badge.svg?timestamp=1722833303)](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml) [![codecov](https://codecov.io/github/pkelaita/l2m2/graph/badge.svg?token=UWIB0L9PR8)](https://codecov.io/github/pkelaita/l2m2) [![PyPI version](https://badge.fury.io/py/l2m2.svg?timestamp=1722833303)](https://badge.fury.io/py/l2m2)
+[![Tests](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml/badge.svg?timestamp=1722903983)](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml) [![codecov](https://codecov.io/github/pkelaita/l2m2/graph/badge.svg?token=UWIB0L9PR8)](https://codecov.io/github/pkelaita/l2m2) [![PyPI version](https://badge.fury.io/py/l2m2.svg?timestamp=1722903983)](https://badge.fury.io/py/l2m2)
 
 **L2M2** ("LLM Manager" &rarr; "LLMM" &rarr; "L2M2") is a tiny and very simple LLM manager for Python that exposes lots of models through a unified API. This is useful for evaluation, demos, production applications etc. that need to easily be model-agnostic.
 
 ### Features
 
-- <!--start-count-->21<!--end-count--> supported models (see below) – regularly updated and with more on the way.
+- <!--start-count-->22<!--end-count--> supported models (see below) – regularly updated and with more on the way.
 - Session chat memory – even across multiple models or with concurrent memory streams.
 - JSON mode
 - Prompt loading tools
@@ -37,9 +37,10 @@ L2M2 currently supports the following models:
 | `claude-3-haiku`    | [Anthropic](https://www.anthropic.com/api)                                                          | `claude-3-haiku-20240307`                                                        |
 | `command-r`         | [Cohere](https://docs.cohere.com/)                                                                  | `command-r`                                                                      |
 | `command-r-plus`    | [Cohere](https://docs.cohere.com/)                                                                  | `command-r-plus`                                                                 |
-| `mistral-7b`        | [OctoAI](https://octoai.cloud/)                                                                     | `mistral-7b-instruct`                                                            |
-| `mixtral-8x7b`      | [Groq](https://wow.groq.com/), [OctoAI](https://octoai.cloud/)                                      | `mixtral-8x7b-32768`, `mixtral-8x7b-instruct`                                    |
-| `mixtral-8x22b`     | [OctoAI](https://octoai.cloud/)                                                                     | `mixtral-8x22b-instruct`                                                         |
+| `mixtral-large-2`   | [Mistral](https://mistral.ai/)                                                                      | `mistral-large-latest`                                                           |
+| `mixtral-8x22b`     | [Mistral](https://mistral.ai/), [OctoAI](https://octoai.cloud/)                                     | `open-mixtral-8x22b`, `mixtral-8x22b-instruct`                                   |
+| `mixtral-8x7b`      | [Mistral](https://mistral.ai/), [OctoAI](https://octoai.cloud/), [Groq](https://wow.groq.com/)      | `open-mixtral-8x7b`, `mixtral-8x7b-instruct`, `mixtral-8x7b-32768`               |
+| `mistral-7b`        | [Mistral](https://mistral.ai/), [OctoAI](https://octoai.cloud/)                                     | `open-mistral-7b`, `mistral-7b-instruct`                                         |
 | `gemma-7b`          | [Groq](https://wow.groq.com/)                                                                       | `gemma-7b-it`                                                                    |
 | `llama3-8b`         | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/)                                  | `llama3-8b-8192`, `meta/meta-llama-3-8b-instruct`                                |
 | `llama3-70b`        | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/), [OctoAI](https://octoai.cloud/) | `llama3-70b-8192`, `meta/meta-llama-3-70b-instruct`, `meta-llama-3-70b-instruct` |
@@ -92,15 +93,16 @@ client = LLMClient()
 
 To activate any of the providers, set the provider's API key in the corresponding environment variable shown below, and L2M2 will read it in to activate the provider.
 
-| Provider  | Environment Variable  |
-| --------- | --------------------- |
-| OpenAI    | `OPENAI_API_KEY`      |
-| Anthropic | `ANTHROPIC_API_KEY`   |
-| Cohere    | `CO_API_KEY`          |
-| Google    | `GOOGLE_API_KEY`      |
-| Groq      | `GROQ_API_KEY`        |
-| Replicate | `REPLICATE_API_TOKEN` |
-| OctoAI    | `OCTOAI_TOKEN`        |
+| Provider                | Environment Variable  |
+| ----------------------- | --------------------- |
+| OpenAI                  | `OPENAI_API_KEY`      |
+| Anthropic               | `ANTHROPIC_API_KEY`   |
+| Cohere                  | `CO_API_KEY`          |
+| Google                  | `GOOGLE_API_KEY`      |
+| Groq                    | `GROQ_API_KEY`        |
+| Replicate               | `REPLICATE_API_TOKEN` |
+| OctoAI                  | `OCTOAI_TOKEN`        |
+| Mistral (La Plateforme) | `MISTRAL_API_KEY`     |
 
 Additionally, you can activate providers programmatically as follows:
 
@@ -497,15 +499,19 @@ print(response)
 > [!IMPORTANT]
 > Regardless of the model and even when `json_mode` is enabled, it's crucial to ensure that either the prompt or the system prompt mentions to return the output in JSON - and ideally, to specify the JSON format, as shown above.
 
-The following models natively support JSON mode:
+The following models natively support JSON mode via the given provider:
 
 <!--start-json-native-->
 
-- `gpt-4o` (Openai)
-- `gpt-4o-mini` (Openai)
-- `gpt-4-turbo` (Openai)
-- `gpt-3.5-turbo` (Openai)
-- `gemini-1.5-pro` (Google)
+- `gpt-4o` (via Openai)
+- `gpt-4o-mini` (via Openai)
+- `gpt-4-turbo` (via Openai)
+- `gpt-3.5-turbo` (via Openai)
+- `gemini-1.5-pro` (via Google)
+- `mixtral-large-2` (via Mistral)
+- `mixtral-8x22b` (via Mistral)
+- `mixtral-8x7b` (via Mistral)
+- `mistral-7b` (via Mistral)
 
 <!--end-json-native-->
 

diff --git a/l2m2/__init__.py b/l2m2/__init__.py
@@ -1 +1 @@
-__version__ = "0.0.29"
+__version__ = "0.0.30"
diff --git a/l2m2/client/base_llm_client.py b/l2m2/client/base_llm_client.py
@@ -36,6 +36,7 @@
     "groq": "GROQ_API_KEY",
     "replicate": "REPLICATE_API_TOKEN",
     "octoai": "OCTOAI_TOKEN",
+    "mistral": "MISTRAL_API_KEY",
 }
 
 
@@ -487,6 +488,7 @@ async def _call_impl(
             memory,
             json_mode,
             json_mode_strategy,
+            model_info["extras"],
         )
 
         # Handle JSON mode strategies for the output (but only if we don't have native support)
@@ -501,30 +503,56 @@ async def _call_impl(
         return str(result)
 
     async def _call_openai(
+        self,
+        *args: Any,
+    ) -> str:
+        return await self._generic_openai_spec_call("openai", *args)
+
+    async def _call_google(
         self,
         model_id: str,
         prompt: str,
         system_prompt: Optional[str],
         params: Dict[str, Any],
         timeout: Optional[int],
         memory: Optional[BaseMemory],
-        *_: Any,  # json_mode and json_mode_strategy are not used here
+        *_: Any,  # json_mode and json_mode_strategy, and extras are not used here
     ) -> str:
-        messages = []
+        data: Dict[str, Any] = {}
+
         if system_prompt is not None:
-            messages.append({"role": "system", "content": system_prompt})
+            # Earlier models don't support system prompts, so prepend it to the prompt
+            if model_id not in ["gemini-1.5-pro"]:
+                prompt = f"{system_prompt}\n{prompt}"
+            else:
+                data["system_instruction"] = {"parts": {"text": system_prompt}}
+
+        messages: List[Dict[str, Any]] = []
         if isinstance(memory, ChatMemory):
-            messages.extend(memory.unpack("role", "content", "user", "assistant"))
-        messages.append({"role": "user", "content": prompt})
+            mem_items = memory.unpack("role", "parts", "user", "model")
+            # Need to do this wrap – see https://ai.google.dev/api/rest/v1beta/cachedContents#Part
+            messages.extend([{**m, "parts": {"text": m["parts"]}} for m in mem_items])
+
+        messages.append({"role": "user", "parts": {"text": prompt}})
+
+        data["contents"] = messages
+        data["generation_config"] = params
+
         result = await llm_post(
             client=self.httpx_client,
-            provider="openai",
+            provider="google",
             model_id=model_id,
-            api_key=self.api_keys["openai"],
-            data={"model": model_id, "messages": messages, **params},
+            api_key=self.api_keys["google"],
+            data=data,
             timeout=timeout,
         )
-        return str(result["choices"][0]["message"]["content"])
+        result = result["candidates"][0]
+
+        # Will sometimes fail due to safety filters
+        if "content" in result:
+            return str(result["content"]["parts"][0]["text"])
+        else:
+            return str(result)
 
     async def _call_anthropic(
         self,
@@ -536,6 +564,7 @@ async def _call_anthropic(
         memory: Optional[BaseMemory],
         json_mode: bool,
         json_mode_strategy: JsonModeStrategy,
+        _: Dict[str, Any],  # extras is not used here
     ) -> str:
         if system_prompt is not None:
             params["system"] = system_prompt
@@ -569,6 +598,7 @@ async def _call_cohere(
         memory: Optional[BaseMemory],
         json_mode: bool,
         json_mode_strategy: JsonModeStrategy,
+        _: Dict[str, Any],  # extras is not used here
     ) -> str:
         if system_prompt is not None:
             params["preamble"] = system_prompt
@@ -593,82 +623,15 @@ async def _call_cohere(
 
     async def _call_groq(
         self,
-        model_id: str,
-        prompt: str,
-        system_prompt: Optional[str],
-        params: Dict[str, Any],
-        timeout: Optional[int],
-        memory: Optional[BaseMemory],
-        json_mode: bool,
-        json_mode_strategy: JsonModeStrategy,
+        *args: Any,
     ) -> str:
-        messages = []
-        if system_prompt is not None:
-            messages.append({"role": "system", "content": system_prompt})
-        if isinstance(memory, ChatMemory):
-            messages.extend(memory.unpack("role", "content", "user", "assistant"))
-        messages.append({"role": "user", "content": prompt})
-
-        if json_mode:
-            append_msg = get_extra_message(json_mode_strategy)
-            if append_msg:
-                messages.append({"role": "assistant", "content": append_msg})
-
-        result = await llm_post(
-            client=self.httpx_client,
-            provider="groq",
-            model_id=model_id,
-            api_key=self.api_keys["groq"],
-            data={"model": model_id, "messages": messages, **params},
-            timeout=timeout,
-        )
-        return str(result["choices"][0]["message"]["content"])
+        return await self._generic_openai_spec_call("groq", *args)
 
-    async def _call_google(
+    async def _call_mistral(
         self,
-        model_id: str,
-        prompt: str,
-        system_prompt: Optional[str],
-        params: Dict[str, Any],
-        timeout: Optional[int],
-        memory: Optional[BaseMemory],
-        *_: Any,  # json_mode and json_mode_strategy are not used here
+        *args: Any,
     ) -> str:
-        data: Dict[str, Any] = {}
-
-        if system_prompt is not None:
-            # Earlier models don't support system prompts, so prepend it to the prompt
-            if model_id not in ["gemini-1.5-pro"]:
-                prompt = f"{system_prompt}\n{prompt}"
-            else:
-                data["system_instruction"] = {"parts": {"text": system_prompt}}
-
-        messages: List[Dict[str, Any]] = []
-        if isinstance(memory, ChatMemory):
-            mem_items = memory.unpack("role", "parts", "user", "model")
-            # Need to do this wrap – see https://ai.google.dev/api/rest/v1beta/cachedContents#Part
-            messages.extend([{**m, "parts": {"text": m["parts"]}} for m in mem_items])
-
-        messages.append({"role": "user", "parts": {"text": prompt}})
-
-        data["contents"] = messages
-        data["generation_config"] = params
-
-        result = await llm_post(
-            client=self.httpx_client,
-            provider="google",
-            model_id=model_id,
-            api_key=self.api_keys["google"],
-            data=data,
-            timeout=timeout,
-        )
-        result = result["candidates"][0]
-
-        # Will sometimes fail due to safety filters
-        if "content" in result:
-            return str(result["content"]["parts"][0]["text"])
-        else:
-            return str(result)
+        return await self._generic_openai_spec_call("mistral", *args)
 
     async def _call_replicate(
         self,
@@ -680,6 +643,7 @@ async def _call_replicate(
         memory: Optional[BaseMemory],
         _: bool,  # json_mode is not used here
         json_mode_strategy: JsonModeStrategy,
+        __: Dict[str, Any],  # extras is not used here
     ) -> str:
         if isinstance(memory, ChatMemory):
             raise LLMOperationError(
@@ -715,30 +679,60 @@ async def _call_octoai(
         memory: Optional[BaseMemory],
         json_mode: bool,
         json_mode_strategy: JsonModeStrategy,
+        _: Dict[str, Any],  # TODO refactor
     ) -> str:
         if isinstance(memory, ChatMemory) and model_id == "mixtral-8x22b-instruct":
             raise LLMOperationError(
                 "Chat memory is not supported with mixtral-8x22b via OctoAI. Try using"
                 + " ExternalMemory instead, or ChatMemory with a different model/provider."
             )
 
+        return await self._generic_openai_spec_call(
+            "octoai",
+            model_id,
+            prompt,
+            system_prompt,
+            params,
+            timeout,
+            memory,
+            json_mode,
+            json_mode_strategy,
+            {},
+        )
+
+    async def _generic_openai_spec_call(
+        self,
+        provider: str,
+        model_id: str,
+        prompt: str,
+        system_prompt: Optional[str],
+        params: Dict[str, Any],
+        timeout: Optional[int],
+        memory: Optional[BaseMemory],
+        json_mode: bool,
+        json_mode_strategy: JsonModeStrategy,
+        extras: Dict[str, Any],
+    ) -> str:
+        """Generic call method for providers who follow the OpenAI API spec."""
+        supports_native_json_mode = "json_mode_arg" in extras
+
         messages = []
         if system_prompt is not None:
             messages.append({"role": "system", "content": system_prompt})
         if isinstance(memory, ChatMemory):
             messages.extend(memory.unpack("role", "content", "user", "assistant"))
         messages.append({"role": "user", "content": prompt})
 
-        if json_mode:
+        if json_mode and not supports_native_json_mode:
             append_msg = get_extra_message(json_mode_strategy)
             if append_msg:
                 messages.append({"role": "assistant", "content": append_msg})
 
         result = await llm_post(
             client=self.httpx_client,
-            provider="octoai",
+            provider=provider,
             model_id=model_id,
-            api_key=self.api_keys["octoai"],
+            api_key=self.api_keys[provider],
             data={"model": model_id, "messages": messages, **params},
             timeout=timeout,
         )