Skip to content

Commit

Permalink
v0.0.30 (#11)
Browse files Browse the repository at this point in the history
  • Loading branch information
pkelaita authored Aug 6, 2024
2 parents 7078afd + ee6adc2 commit 64b32fd
Show file tree
Hide file tree
Showing 7 changed files with 217 additions and 132 deletions.
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
# Changelog

_Current version: 0.0.29_
_Current version: 0.0.30_

[PyPi link](https://pypi.org/project/l2m2/)

### 0.0.30 - August 5, 2024

#### Added

- [Mistral](https://mistral.ai/) provider support via La Plateforme.
- [Mistral Large 2](https://mistral.ai/news/mistral-large-2407/) model availibility from Mistral.
- Mistral 7B, Mixtral 8x7B, and Mixtral 8x22B model availibility from Mistral in addition to existing providers.

### 0.0.29 - August 4, 2024

> [!CAUTION]
Expand Down
46 changes: 26 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# L2M2: A Simple Python LLM Manager 💬👍

[![Tests](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml/badge.svg?timestamp=1722833303)](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml) [![codecov](https://codecov.io/github/pkelaita/l2m2/graph/badge.svg?token=UWIB0L9PR8)](https://codecov.io/github/pkelaita/l2m2) [![PyPI version](https://badge.fury.io/py/l2m2.svg?timestamp=1722833303)](https://badge.fury.io/py/l2m2)
[![Tests](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml/badge.svg?timestamp=1722903983)](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml) [![codecov](https://codecov.io/github/pkelaita/l2m2/graph/badge.svg?token=UWIB0L9PR8)](https://codecov.io/github/pkelaita/l2m2) [![PyPI version](https://badge.fury.io/py/l2m2.svg?timestamp=1722903983)](https://badge.fury.io/py/l2m2)

**L2M2** ("LLM Manager" → "LLMM" → "L2M2") is a tiny and very simple LLM manager for Python that exposes lots of models through a unified API. This is useful for evaluation, demos, production applications etc. that need to easily be model-agnostic.

### Features

- <!--start-count-->21<!--end-count--> supported models (see below) – regularly updated and with more on the way.
- <!--start-count-->22<!--end-count--> supported models (see below) – regularly updated and with more on the way.
- Session chat memory – even across multiple models or with concurrent memory streams.
- JSON mode
- Prompt loading tools
Expand Down Expand Up @@ -37,9 +37,10 @@ L2M2 currently supports the following models:
| `claude-3-haiku` | [Anthropic](https://www.anthropic.com/api) | `claude-3-haiku-20240307` |
| `command-r` | [Cohere](https://docs.cohere.com/) | `command-r` |
| `command-r-plus` | [Cohere](https://docs.cohere.com/) | `command-r-plus` |
| `mistral-7b` | [OctoAI](https://octoai.cloud/) | `mistral-7b-instruct` |
| `mixtral-8x7b` | [Groq](https://wow.groq.com/), [OctoAI](https://octoai.cloud/) | `mixtral-8x7b-32768`, `mixtral-8x7b-instruct` |
| `mixtral-8x22b` | [OctoAI](https://octoai.cloud/) | `mixtral-8x22b-instruct` |
| `mixtral-large-2` | [Mistral](https://mistral.ai/) | `mistral-large-latest` |
| `mixtral-8x22b` | [Mistral](https://mistral.ai/), [OctoAI](https://octoai.cloud/) | `open-mixtral-8x22b`, `mixtral-8x22b-instruct` |
| `mixtral-8x7b` | [Mistral](https://mistral.ai/), [OctoAI](https://octoai.cloud/), [Groq](https://wow.groq.com/) | `open-mixtral-8x7b`, `mixtral-8x7b-instruct`, `mixtral-8x7b-32768` |
| `mistral-7b` | [Mistral](https://mistral.ai/), [OctoAI](https://octoai.cloud/) | `open-mistral-7b`, `mistral-7b-instruct` |
| `gemma-7b` | [Groq](https://wow.groq.com/) | `gemma-7b-it` |
| `llama3-8b` | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/) | `llama3-8b-8192`, `meta/meta-llama-3-8b-instruct` |
| `llama3-70b` | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/), [OctoAI](https://octoai.cloud/) | `llama3-70b-8192`, `meta/meta-llama-3-70b-instruct`, `meta-llama-3-70b-instruct` |
Expand Down Expand Up @@ -92,15 +93,16 @@ client = LLMClient()

To activate any of the providers, set the provider's API key in the corresponding environment variable shown below, and L2M2 will read it in to activate the provider.

| Provider | Environment Variable |
| --------- | --------------------- |
| OpenAI | `OPENAI_API_KEY` |
| Anthropic | `ANTHROPIC_API_KEY` |
| Cohere | `CO_API_KEY` |
| Google | `GOOGLE_API_KEY` |
| Groq | `GROQ_API_KEY` |
| Replicate | `REPLICATE_API_TOKEN` |
| OctoAI | `OCTOAI_TOKEN` |
| Provider | Environment Variable |
| ----------------------- | --------------------- |
| OpenAI | `OPENAI_API_KEY` |
| Anthropic | `ANTHROPIC_API_KEY` |
| Cohere | `CO_API_KEY` |
| Google | `GOOGLE_API_KEY` |
| Groq | `GROQ_API_KEY` |
| Replicate | `REPLICATE_API_TOKEN` |
| OctoAI | `OCTOAI_TOKEN` |
| Mistral (La Plateforme) | `MISTRAL_API_KEY` |

Additionally, you can activate providers programmatically as follows:

Expand Down Expand Up @@ -497,15 +499,19 @@ print(response)
> [!IMPORTANT]
> Regardless of the model and even when `json_mode` is enabled, it's crucial to ensure that either the prompt or the system prompt mentions to return the output in JSON - and ideally, to specify the JSON format, as shown above.

The following models natively support JSON mode:
The following models natively support JSON mode via the given provider:

<!--start-json-native-->

- `gpt-4o` (Openai)
- `gpt-4o-mini` (Openai)
- `gpt-4-turbo` (Openai)
- `gpt-3.5-turbo` (Openai)
- `gemini-1.5-pro` (Google)
- `gpt-4o` (via Openai)
- `gpt-4o-mini` (via Openai)
- `gpt-4-turbo` (via Openai)
- `gpt-3.5-turbo` (via Openai)
- `gemini-1.5-pro` (via Google)
- `mixtral-large-2` (via Mistral)
- `mixtral-8x22b` (via Mistral)
- `mixtral-8x7b` (via Mistral)
- `mistral-7b` (via Mistral)

<!--end-json-native-->

Expand Down
2 changes: 1 addition & 1 deletion l2m2/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.0.29"
__version__ = "0.0.30"
162 changes: 78 additions & 84 deletions l2m2/client/base_llm_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
"groq": "GROQ_API_KEY",
"replicate": "REPLICATE_API_TOKEN",
"octoai": "OCTOAI_TOKEN",
"mistral": "MISTRAL_API_KEY",
}


Expand Down Expand Up @@ -487,6 +488,7 @@ async def _call_impl(
memory,
json_mode,
json_mode_strategy,
model_info["extras"],
)

# Handle JSON mode strategies for the output (but only if we don't have native support)
Expand All @@ -501,30 +503,56 @@ async def _call_impl(
return str(result)

async def _call_openai(
self,
*args: Any,
) -> str:
return await self._generic_openai_spec_call("openai", *args)

async def _call_google(
self,
model_id: str,
prompt: str,
system_prompt: Optional[str],
params: Dict[str, Any],
timeout: Optional[int],
memory: Optional[BaseMemory],
*_: Any, # json_mode and json_mode_strategy are not used here
*_: Any, # json_mode and json_mode_strategy, and extras are not used here
) -> str:
messages = []
data: Dict[str, Any] = {}

if system_prompt is not None:
messages.append({"role": "system", "content": system_prompt})
# Earlier models don't support system prompts, so prepend it to the prompt
if model_id not in ["gemini-1.5-pro"]:
prompt = f"{system_prompt}\n{prompt}"
else:
data["system_instruction"] = {"parts": {"text": system_prompt}}

messages: List[Dict[str, Any]] = []
if isinstance(memory, ChatMemory):
messages.extend(memory.unpack("role", "content", "user", "assistant"))
messages.append({"role": "user", "content": prompt})
mem_items = memory.unpack("role", "parts", "user", "model")
# Need to do this wrap – see https://ai.google.dev/api/rest/v1beta/cachedContents#Part
messages.extend([{**m, "parts": {"text": m["parts"]}} for m in mem_items])

messages.append({"role": "user", "parts": {"text": prompt}})

data["contents"] = messages
data["generation_config"] = params

result = await llm_post(
client=self.httpx_client,
provider="openai",
provider="google",
model_id=model_id,
api_key=self.api_keys["openai"],
data={"model": model_id, "messages": messages, **params},
api_key=self.api_keys["google"],
data=data,
timeout=timeout,
)
return str(result["choices"][0]["message"]["content"])
result = result["candidates"][0]

# Will sometimes fail due to safety filters
if "content" in result:
return str(result["content"]["parts"][0]["text"])
else:
return str(result)

async def _call_anthropic(
self,
Expand All @@ -536,6 +564,7 @@ async def _call_anthropic(
memory: Optional[BaseMemory],
json_mode: bool,
json_mode_strategy: JsonModeStrategy,
_: Dict[str, Any], # extras is not used here
) -> str:
if system_prompt is not None:
params["system"] = system_prompt
Expand Down Expand Up @@ -569,6 +598,7 @@ async def _call_cohere(
memory: Optional[BaseMemory],
json_mode: bool,
json_mode_strategy: JsonModeStrategy,
_: Dict[str, Any], # extras is not used here
) -> str:
if system_prompt is not None:
params["preamble"] = system_prompt
Expand All @@ -593,82 +623,15 @@ async def _call_cohere(

async def _call_groq(
self,
model_id: str,
prompt: str,
system_prompt: Optional[str],
params: Dict[str, Any],
timeout: Optional[int],
memory: Optional[BaseMemory],
json_mode: bool,
json_mode_strategy: JsonModeStrategy,
*args: Any,
) -> str:
messages = []
if system_prompt is not None:
messages.append({"role": "system", "content": system_prompt})
if isinstance(memory, ChatMemory):
messages.extend(memory.unpack("role", "content", "user", "assistant"))
messages.append({"role": "user", "content": prompt})

if json_mode:
append_msg = get_extra_message(json_mode_strategy)
if append_msg:
messages.append({"role": "assistant", "content": append_msg})

result = await llm_post(
client=self.httpx_client,
provider="groq",
model_id=model_id,
api_key=self.api_keys["groq"],
data={"model": model_id, "messages": messages, **params},
timeout=timeout,
)
return str(result["choices"][0]["message"]["content"])
return await self._generic_openai_spec_call("groq", *args)

async def _call_google(
async def _call_mistral(
self,
model_id: str,
prompt: str,
system_prompt: Optional[str],
params: Dict[str, Any],
timeout: Optional[int],
memory: Optional[BaseMemory],
*_: Any, # json_mode and json_mode_strategy are not used here
*args: Any,
) -> str:
data: Dict[str, Any] = {}

if system_prompt is not None:
# Earlier models don't support system prompts, so prepend it to the prompt
if model_id not in ["gemini-1.5-pro"]:
prompt = f"{system_prompt}\n{prompt}"
else:
data["system_instruction"] = {"parts": {"text": system_prompt}}

messages: List[Dict[str, Any]] = []
if isinstance(memory, ChatMemory):
mem_items = memory.unpack("role", "parts", "user", "model")
# Need to do this wrap – see https://ai.google.dev/api/rest/v1beta/cachedContents#Part
messages.extend([{**m, "parts": {"text": m["parts"]}} for m in mem_items])

messages.append({"role": "user", "parts": {"text": prompt}})

data["contents"] = messages
data["generation_config"] = params

result = await llm_post(
client=self.httpx_client,
provider="google",
model_id=model_id,
api_key=self.api_keys["google"],
data=data,
timeout=timeout,
)
result = result["candidates"][0]

# Will sometimes fail due to safety filters
if "content" in result:
return str(result["content"]["parts"][0]["text"])
else:
return str(result)
return await self._generic_openai_spec_call("mistral", *args)

async def _call_replicate(
self,
Expand All @@ -680,6 +643,7 @@ async def _call_replicate(
memory: Optional[BaseMemory],
_: bool, # json_mode is not used here
json_mode_strategy: JsonModeStrategy,
__: Dict[str, Any], # extras is not used here
) -> str:
if isinstance(memory, ChatMemory):
raise LLMOperationError(
Expand Down Expand Up @@ -715,30 +679,60 @@ async def _call_octoai(
memory: Optional[BaseMemory],
json_mode: bool,
json_mode_strategy: JsonModeStrategy,
_: Dict[str, Any], # TODO refactor
) -> str:
if isinstance(memory, ChatMemory) and model_id == "mixtral-8x22b-instruct":
raise LLMOperationError(
"Chat memory is not supported with mixtral-8x22b via OctoAI. Try using"
+ " ExternalMemory instead, or ChatMemory with a different model/provider."
)

return await self._generic_openai_spec_call(
"octoai",
model_id,
prompt,
system_prompt,
params,
timeout,
memory,
json_mode,
json_mode_strategy,
{},
)

async def _generic_openai_spec_call(
self,
provider: str,
model_id: str,
prompt: str,
system_prompt: Optional[str],
params: Dict[str, Any],
timeout: Optional[int],
memory: Optional[BaseMemory],
json_mode: bool,
json_mode_strategy: JsonModeStrategy,
extras: Dict[str, Any],
) -> str:
"""Generic call method for providers who follow the OpenAI API spec."""
supports_native_json_mode = "json_mode_arg" in extras

messages = []
if system_prompt is not None:
messages.append({"role": "system", "content": system_prompt})
if isinstance(memory, ChatMemory):
messages.extend(memory.unpack("role", "content", "user", "assistant"))
messages.append({"role": "user", "content": prompt})

if json_mode:
if json_mode and not supports_native_json_mode:
append_msg = get_extra_message(json_mode_strategy)
if append_msg:
messages.append({"role": "assistant", "content": append_msg})

result = await llm_post(
client=self.httpx_client,
provider="octoai",
provider=provider,
model_id=model_id,
api_key=self.api_keys["octoai"],
api_key=self.api_keys[provider],
data={"model": model_id, "messages": messages, **params},
timeout=timeout,
)
Expand Down
Loading

0 comments on commit 64b32fd

Please sign in to comment.