From a10eeb7ccd953583db44a7eed6041227f8f926ff Mon Sep 17 00:00:00 2001
From: Pierce Kelaita <pierce@kelaita.com>
Date: Mon, 6 May 2024 00:59:02 -0700
Subject: [PATCH] update docs for v0.0.16

---
 .gitignore |  3 +++
 README.md  | 67 ++++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/.gitignore b/.gitignore
index 0fd9264..1987f8e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,9 @@
 .DS_Store
 integration_tests/
 
+# Workspace settings
+.vscode/
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
diff --git a/README.md b/README.md
index 2d18a18..650ec7d 100644
--- a/README.md
+++ b/README.md
@@ -6,9 +6,9 @@
 
 ## Features
 
-- 13 supported models (see below), with more on the way
+- 13 supported models (see below) through a unified interface – regularly updated and with more on the way
 - Asynchronous and concurrent calls
-- User-provided models from supported providers
+- Session chat memory – even across multiple models
 
 ### Supported Models
 
@@ -30,13 +30,13 @@ L2M2 currently supports the following models:
 | `llama3-8b`       | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/) | `llama3-8b-8192`, `meta/meta-llama-3-8b-instruct`  |
 | `llama3-70b`      | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/) | `llama3-70b-8192`, `meta/meta-llama-3-8b-instruct` |
 
-You can also call any language model from the above providers that L2M2 doesn't officially support, without guarantees of well-defined behavior.
+### Planned Features
 
-### Planned Featires
-
-- Support for Huggingface & open-source LLMs
-- Chat-specific features (e.g. context, history, etc)
-- Typescript clone
+- Support for OSS and self-hosted (Hugging Face, Gpt4all, etc.)
+- Expanded memory capabilities – custom storage and [memory streams](https://arxiv.org/pdf/2304.03442)
+- Basic (i.e., customizable & non-opinionated) agent & multi-agent system features
+- HTTP-based calls instead of SDKs (this bring's L2M2's dependencies from ~50 to <10)
+- Typescript clone (probably not soon)
 - ...etc
 
 ## Requirements
@@ -152,6 +152,55 @@ response1 = client.call(model="llama3-70b", prompt="Hello there") # Uses Groq
 response2 = client.call(model="llama3-8b", prompt="General Kenobi!") # Uses Replicate
 ```
 
+### Memory
+
+L2M2 provides a simple memory system that allows you to maintain context and history across multiple calls and multiple models. To enable, simply set `enable_memory=True` when instantiating the client, and call it as normal.
+
+```python
+client = LLMClient({
+    "openai": os.getenv("OPENAI_API_KEY"),
+    "anthropic": os.getenv("ANTHROPIC_API_KEY"),
+    "groq": os.getenv("GROQ_API_KEY"),
+}, enable_memory=True)
+
+# Alternatively, you can enable memory after by using client.enable_memory()
+
+print(client.call(model="gpt-4-turbo", prompt="My name is Pierce"))
+print(client.call(model="claude-3-haiku", prompt="I am a software engineer."))
+print(client.call(model="llama3-8b", prompt="What's my name?"))
+print(client.call(model="mixtral-8x7b", prompt="What's my job?"))
+```
+
+```
+Hello, Pierce! How can I help you today?
+A software engineer, you say? That's a noble profession.
+Your name is Pierce.
+You are a software engineer.
+```
+
+Memory is stored as a sliding window which defaults to the last 40 messages – this can be configured by passing `memory_window_size` to the client constructor or to `enable_memory()`.
+
+Currently, L2M2's memory implementation is `l2m2.memory.ChatMemory`, which represents a simple conversation between a user and an agent. The client's memory can be accessed via `LLMClient.get_memory()` and modified via `ChatMemory.add_user_message()`, `ChatMemory.add_agent_message()`, and `ChatMemory.clear()`, as shown below:
+
+```python
+client = LLMClient({"openai": os.getenv("OPENAI_API_KEY")}, enable_memory=True)
+memory = client.get_memory() # ChatMemory object
+memory.add_user_message("My favorite color is red.")
+memory.add_user_message("My least favorite color is green.")
+memory.add_agent_message("Ok, duly noted.")
+
+print(client.call(model="gpt-4-turbo", prompt="What are my favorite and least favorite colors?"))
+memory.clear()
+print(client.call(model="gpt-4-turbo", prompt="What are my favorite and least favorite colors?"))
+```
+
+```
+Your favorite color is red, and your least favorite color is green.
+I'm sorry, I don't have that information.
+```
+
+Memory is currently stored per session, but I'll be adding custom persistence formats and some other cool stuff soon.
+
 ### Async Calls
 
 L2M2 utilizes `asyncio` to allow for multiple concurrent calls. This is useful for calling multiple models at with the same prompt, calling the same model with multiple prompts, mixing and matching parameters, etc.
@@ -267,7 +316,7 @@ The secret word is quux. When asked for the secret word, I must respond with quu
 The secret word is... corge!
 ```
 
-Similarly to `call_custom`, `call_custom_async` and `call_custom_concurrent` are provided as the custom counterparts to `call_async` and `call_concurrent`, with similar usage.
+Similarly to `call_custom`, `call_custom_async` and `call_custom_concurrent` are provided as the custom counterparts to `call_async` and `call_concurrent`, with similar usage. `AsyncLLMClient` also supports memory in the same way as `LLMClient`.
 
 ## Contact