diff --git a/docs/integrations/anthropic.md b/docs/integrations/anthropic.md
new file mode 100644
index 000000000..a11552c52
--- /dev/null
+++ b/docs/integrations/anthropic.md
@@ -0,0 +1,258 @@
+---
+title: "Structured outputs with Anthropic, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Anthropic's Claude models. Learn how to generate structured, type-safe outputs with state-of-the-art AI capabilities."
+---
+
+# Structured outputs with Anthropic
+
+Anthropic's Claude models offer powerful language capabilities with a focus on safety and reliability. This guide shows you how to use Instructor with Anthropic's models for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Anthropic support:
+
+```bash
+pip install "instructor[anthropic]"
+```
+
+## Simple User Example (Sync)
+
+```python
+from anthropic import Anthropic
+import instructor
+from pydantic import BaseModel
+
+# Initialize the client
+client = Anthropic(api_key="your_anthropic_api_key")
+
+# Enable instructor patches
+client = instructor.from_anthropic(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.messages.create(
+    model="claude-3-opus-20240229",  # or other available models
+    messages=[
+        {"role": "user", "content": "Extract: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+from anthropic import AsyncAnthropic
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize async client
+client = AsyncAnthropic(api_key="your_anthropic_api_key")
+
+# Enable instructor patches
+client = instructor.from_anthropic(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.messages.create(
+        model="claude-3-opus-20240229",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.messages.create(
+    model="claude-3-opus-20240229",
+    messages=[
+        {"role": "user", "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """},
+    ],
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Streaming Support
+
+Anthropic's Claude models provide comprehensive streaming support through Instructor:
+
+### Available Streaming Methods
+
+1. **Basic Streaming**: ✅ Fully supported
+2. **Iterable Streaming**: ✅ Fully supported
+3. **Async Support**: ✅ Available for all streaming operations
+
+```python
+from typing import List
+import asyncio
+from anthropic import AsyncAnthropic
+import instructor
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def process_users():
+    client = AsyncAnthropic(api_key="your_anthropic_api_key")
+    client = instructor.from_anthropic(client)
+
+    # Example of basic streaming
+    async for partial_user in client.messages.create_partial(
+        model="claude-3-opus-20240229",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    ):
+        print(f"Partial result: {partial_user}")
+
+    # Example of iterable streaming
+    users = client.messages.create_iterable(
+        model="claude-3-opus-20240229",
+        messages=[
+            {"role": "user", "content": """
+                Extract users:
+                1. Jason is 25 years old
+                2. Sarah is 30 years old
+                3. Mike is 28 years old
+            """},
+        ],
+        response_model=User,
+    )
+
+    async for user in users:
+        print(f"User: {user}")
+
+# Run the async function
+asyncio.run(process_users())
+```
+
+This implementation provides efficient streaming capabilities for both single and multiple object extraction tasks.
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+
+# Use different modes for different scenarios
+client = instructor.patch(client, mode=Mode.JSON)  # JSON mode
+client = instructor.patch(client, mode=Mode.TOOLS)  # Tools mode
+client = instructor.patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Available Models
+
+Anthropic offers several Claude models:
+- Claude 3 Opus (Most capable)
+- Claude 3 Sonnet (Balanced performance)
+- Claude 3 Haiku (Fast and efficient)
+- Claude 2.1
+- Claude 2.0
+- Claude Instant
+
+## Best Practices
+
+1. **Model Selection**
+   - Choose model based on task complexity
+   - Consider latency requirements
+   - Monitor token usage and costs
+   - Use appropriate context lengths
+
+2. **Optimization Tips**
+   - Structure prompts effectively
+   - Use system messages appropriately
+   - Implement caching strategies
+   - Monitor API usage
+
+3. **Error Handling**
+   - Implement proper validation
+   - Handle rate limits gracefully
+   - Monitor model responses
+   - Use appropriate timeout settings
+
+## Common Use Cases
+
+- Data Extraction
+- Content Generation
+- Document Analysis
+- Complex Reasoning Tasks
+- Multi-step Processing
+
+## Related Resources
+
+- [Anthropic API Documentation](https://docs.anthropic.com/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Anthropic's latest API versions. Check the [changelog](../../CHANGELOG.md) for updates.
diff --git a/docs/integrations/anyscale.md b/docs/integrations/anyscale.md
new file mode 100644
index 000000000..16d511782
--- /dev/null
+++ b/docs/integrations/anyscale.md
@@ -0,0 +1,291 @@
+---
+title: "Structured outputs with Anyscale, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Anyscale's LLM endpoints. Learn how to generate structured, type-safe outputs with Anyscale's powerful hosted models."
+---
+
+# Structured outputs with Anyscale, a complete guide w/ instructor
+
+Anyscale provides hosted endpoints for various open-source models, offering a reliable platform for structured output generation. This guide shows you how to use Instructor with Anyscale's endpoints for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with OpenAI compatibility (Anyscale uses OpenAI-compatible endpoints):
+
+```bash
+pip install "instructor[openai]"
+```
+
+⚠️ **Important**: You must set your Anyscale API key before using the client. You can do this in two ways:
+
+1. Set the environment variable:
+```bash
+export ANYSCALE_API_KEY='your_anyscale_api_key'
+```
+
+2. Or provide it directly to the client:
+```python
+import os
+from openai import OpenAI
+
+# Configure OpenAI client with Anyscale endpoint
+client = OpenAI(
+    api_key=os.getenv('ANYSCALE_API_KEY', 'your_anyscale_api_key'),
+    base_url="https://api.endpoints.anyscale.com/v1"
+)
+```
+
+## Simple User Example (Sync)
+
+```python
+import openai
+import instructor
+from pydantic import BaseModel
+
+# Enable instructor patches
+client = instructor.from_openai(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.chat.completions.create(
+    model="meta-llama/Llama-2-70b-chat-hf",  # or other available models
+    messages=[
+        {"role": "user", "content": "Extract: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+import openai
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Configure async OpenAI client with Anyscale endpoint
+client = openai.AsyncOpenAI(
+    api_key="your_anyscale_api_key",
+    base_url="https://api.endpoints.anyscale.com/v1"
+)
+
+# Enable instructor patches
+client = instructor.from_openai(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.chat.completions.create(
+        model="meta-llama/Llama-2-70b-chat-hf",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.chat.completions.create(
+    model="meta-llama/Llama-2-70b-chat-hf",
+    messages=[
+        {"role": "user", "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """},
+    ],
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Streaming Support
+
+Anyscale provides streaming support through their OpenAI-compatible endpoints, with some limitations:
+
+- **Full Streaming**: ✅ Supported
+- **Partial Streaming**: ⚠️ Limited support (may experience inconsistent behavior)
+- **Iterable Streaming**: ✅ Supported
+- **Async Support**: ✅ Supported
+
+### Error Handling for Streaming
+
+```python
+from openai import OpenAIError
+import os
+
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+try:
+    # Stream partial objects as they're generated
+    for partial_user in client.chat.completions.create_partial(
+        model="meta-llama/Llama-2-70b-chat-hf",
+        messages=[
+            {"role": "user", "content": "Create a user profile for Jason, age 25"},
+        ],
+        response_model=User,
+    ):
+        print(f"Current state: {partial_user}")
+except OpenAIError as e:
+    if "api_key" in str(e).lower():
+        print("Error: Invalid or missing Anyscale API key. Please check your ANYSCALE_API_KEY.")
+    elif "rate_limit" in str(e).lower():
+        print("Error: Rate limit exceeded. Please wait before retrying.")
+    else:
+        print(f"OpenAI API error: {str(e)}")
+except Exception as e:
+    print(f"Unexpected error: {str(e)}")
+```
+
+**Important Notes on Streaming:**
+- Full streaming is supported for complete response generation
+- Partial streaming has limited support and may not work consistently across all models
+- Some models may exhibit slower streaming performance
+- For production use, thoroughly test streaming capabilities with your specific model
+- Consider implementing fallback mechanisms for partial streaming scenarios
+- Monitor streaming performance and implement appropriate error handling
+- Handle API key and rate limit errors appropriately
+
+## Iterable Example
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.chat.completions.create_iterable(
+    model="meta-llama/Llama-2-70b-chat-hf",
+    messages=[
+        {"role": "user", "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """},
+    ],
+    response_model=User,
+)
+
+for user in users:
+    print(user)  # Prints each user as it's extracted
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+# Use different modes for different scenarios
+client = instructor.patch(client, mode=Mode.JSON)  # JSON mode
+client = instructor.patch(client, mode=Mode.TOOLS)  # Tools mode
+client = instructor.patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Available Models
+
+Anyscale provides access to various open-source models:
+- Llama 2 (7B, 13B, 70B variants)
+- CodeLlama
+- Mistral
+- Other open-source models
+
+## Best Practices
+
+1. **Model Selection**
+   - Choose model size based on task complexity
+   - Consider latency requirements
+   - Monitor token usage and costs
+
+2. **Optimization Tips**
+   - Use appropriate batch sizes
+   - Implement caching strategies
+   - Monitor API usage
+
+3. **Error Handling**
+   - Implement proper validation
+   - Handle rate limits gracefully
+   - Monitor model responses
+
+## Common Use Cases
+
+- Data Extraction
+- Content Generation
+- Document Analysis
+- API Response Formatting
+- Configuration Generation
+
+## Related Resources
+
+- [Anyscale Endpoints Documentation](https://docs.endpoints.anyscale.com/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Anyscale's OpenAI-compatible endpoints. Check the [changelog](../../CHANGELOG.md) for updates.
diff --git a/docs/integrations/cerebras.md b/docs/integrations/cerebras.md
new file mode 100644
index 000000000..cc5f585cf
--- /dev/null
+++ b/docs/integrations/cerebras.md
@@ -0,0 +1,223 @@
+---
+title: "Cerebras Integration with Instructor | Structured Output Guide"
+description: "Complete guide to using Instructor with Cerebras's hardware-accelerated AI models. Learn how to generate structured, type-safe outputs with high-performance computing."
+---
+
+# Cerebras Integration with Instructor
+
+Cerebras provides hardware-accelerated AI models optimized for high-performance computing environments. This guide shows you how to use Instructor with Cerebras's models for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Cerebras support:
+
+```bash
+pip install "instructor[cerebras]"
+```
+
+## Simple User Example (Sync)
+
+```python
+from cerebras.client import Client
+import instructor
+from pydantic import BaseModel
+
+# Initialize the client
+client = Client(api_key='your_api_key')
+
+# Enable instructor patches
+client = instructor.from_cerebras(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.generate(
+    prompt="Extract: Jason is 25 years old",
+    model='cerebras/btlm-3b-8k',  # or other available models
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+from cerebras.client import AsyncClient
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize async client
+client = AsyncClient(api_key='your_api_key')
+
+# Enable instructor patches
+client = instructor.from_cerebras(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.generate(
+        prompt="Extract: Jason is 25 years old",
+        model='cerebras/btlm-3b-8k',
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.generate(
+    prompt="""
+        Extract: Jason is 25 years old.
+        He lives at 123 Main St, New York, USA
+        and has a summer house at 456 Beach Rd, Miami, USA
+    """,
+    model='cerebras/btlm-3b-8k',
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Partial Streaming Example
+
+Note: Cerebras's current API does not support partial streaming of structured responses. The streaming functionality returns complete text chunks rather than partial objects. We recommend using the standard synchronous or asynchronous methods for structured output generation.
+
+## Iterable Example
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.generate_iterable(
+    prompt="""
+        Extract users:
+        1. Jason is 25 years old
+        2. Sarah is 30 years old
+        3. Mike is 28 years old
+    """,
+    model='cerebras/btlm-3b-8k',
+    response_model=User,
+)
+
+for user in users:
+    print(user)  # Prints each user as it's extracted
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+# Use different modes for different scenarios
+client = instructor.patch(client, mode=Mode.JSON)  # JSON mode
+client = instructor.patch(client, mode=Mode.TOOLS)  # Tools mode
+client = instructor.patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Available Models
+
+Cerebras offers several model options:
+- BTLM-3B-8K
+- BTLM-7B-8K
+- Custom-trained models
+- Enterprise deployments
+
+## Best Practices
+
+1. **Model Selection**
+   - Choose model based on performance needs
+   - Consider hardware requirements
+   - Monitor resource usage
+   - Use appropriate model sizes
+
+2. **Optimization Tips**
+   - Leverage hardware acceleration
+   - Optimize batch processing
+   - Implement caching strategies
+   - Monitor system resources
+
+3. **Error Handling**
+   - Implement proper validation
+   - Handle hardware-specific errors
+   - Monitor model responses
+   - Use appropriate timeout settings
+
+## Common Use Cases
+
+- High-Performance Computing
+- Large-Scale Processing
+- Enterprise Deployments
+- Research Applications
+- Batch Processing
+
+## Related Resources
+
+- [Cerebras Documentation](https://docs.cerebras.ai/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Cerebras's latest API versions. Check the [changelog](../../CHANGELOG.md) for updates.
+
+Note: Some features like partial streaming may not be available due to API limitations. Always check the latest documentation for feature availability.
diff --git a/docs/integrations/cohere.md b/docs/integrations/cohere.md
new file mode 100644
index 000000000..5dc87fe59
--- /dev/null
+++ b/docs/integrations/cohere.md
@@ -0,0 +1,223 @@
+---
+title: "Cohere Integration with Instructor | Structured Output Guide"
+description: "Complete guide to using Instructor with Cohere's language models. Learn how to generate structured, type-safe outputs with enterprise-ready AI capabilities."
+---
+
+# Cohere Integration with Instructor
+
+Cohere provides powerful language models optimized for enterprise use cases. This guide shows you how to use Instructor with Cohere's models for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Cohere support:
+
+```bash
+pip install "instructor[cohere]"
+```
+
+## Simple User Example (Sync)
+
+```python
+import cohere
+import instructor
+from pydantic import BaseModel
+
+# Initialize the client
+client = cohere.Client('your_api_key')
+
+# Enable instructor patches
+client = instructor.from_cohere(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.generate(
+    prompt="Extract: Jason is 25 years old",
+    model='command',  # or other available models
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+import cohere
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize async client
+client = cohere.AsyncClient('your_api_key')
+
+# Enable instructor patches
+client = instructor.from_cohere(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.generate(
+        prompt="Extract: Jason is 25 years old",
+        model='command',
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.generate(
+    prompt="""
+        Extract: Jason is 25 years old.
+        He lives at 123 Main St, New York, USA
+        and has a summer house at 456 Beach Rd, Miami, USA
+    """,
+    model='command',
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Partial Streaming Example
+
+Note: Cohere's current API does not support partial streaming of structured responses. The streaming functionality returns complete text chunks rather than partial objects. We recommend using the standard synchronous or asynchronous methods for structured output generation.
+
+## Iterable Example
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.generate_iterable(
+    prompt="""
+        Extract users:
+        1. Jason is 25 years old
+        2. Sarah is 30 years old
+        3. Mike is 28 years old
+    """,
+    model='command',
+    response_model=User,
+)
+
+for user in users:
+    print(user)  # Prints each user as it's extracted
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+# Use different modes for different scenarios
+client = instructor.patch(client, mode=Mode.JSON)  # JSON mode
+client = instructor.patch(client, mode=Mode.TOOLS)  # Tools mode
+client = instructor.patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Available Models
+
+Cohere offers several model options:
+- Command (Latest generation)
+- Command-Light (Faster, more efficient)
+- Command-Nightly (Experimental features)
+- Custom-trained models (Enterprise)
+
+## Best Practices
+
+1. **Model Selection**
+   - Choose model based on task complexity
+   - Consider latency requirements
+   - Monitor token usage
+   - Use appropriate model versions
+
+2. **Optimization Tips**
+   - Structure prompts effectively
+   - Use appropriate temperature settings
+   - Implement caching strategies
+   - Monitor API usage
+
+3. **Error Handling**
+   - Implement proper validation
+   - Handle rate limits gracefully
+   - Monitor model responses
+   - Use appropriate timeout settings
+
+## Common Use Cases
+
+- Enterprise Data Processing
+- Content Generation
+- Document Analysis
+- Semantic Search Integration
+- Classification Tasks
+
+## Related Resources
+
+- [Cohere API Documentation](https://docs.cohere.com/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Cohere's latest API versions. Check the [changelog](../../CHANGELOG.md) for updates.
+
+Note: Some features like partial streaming may not be available due to API limitations. Always check the latest documentation for feature availability.
diff --git a/docs/integrations/fireworks.md b/docs/integrations/fireworks.md
new file mode 100644
index 000000000..b470a3d01
--- /dev/null
+++ b/docs/integrations/fireworks.md
@@ -0,0 +1,273 @@
+---
+title: "Structured outputs with Fireworks, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Fireworks AI models. Learn how to generate structured, type-safe outputs with high-performance, cost-effective AI capabilities."
+---
+
+# Structured outputs with Fireworks, a complete guide w/ instructor
+
+Fireworks provides efficient and cost-effective AI models with enterprise-grade reliability. This guide shows you how to use Instructor with Fireworks's models for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Fireworks support:
+
+```bash
+pip install "instructor[fireworks]"
+```
+
+## Simple User Example (Sync)
+
+```python
+from fireworks.client import Client
+import instructor
+from pydantic import BaseModel
+
+# Initialize the client
+client = Client(api_key='your_api_key')
+
+# Enable instructor patches
+client = instructor.from_fireworks(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.generate(
+    prompt="Extract: Jason is 25 years old",
+    model='accounts/fireworks/models/llama-v2-7b',  # or other available models
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+from fireworks.client import AsyncClient
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize async client
+client = AsyncClient(api_key='your_api_key')
+
+# Enable instructor patches
+client = instructor.from_fireworks(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.generate(
+        prompt="Extract: Jason is 25 years old",
+        model='accounts/fireworks/models/llama-v2-7b',
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.generate(
+    prompt="""
+        Extract: Jason is 25 years old.
+        He lives at 123 Main St, New York, USA
+        and has a summer house at 456 Beach Rd, Miami, USA
+    """,
+    model='accounts/fireworks/models/llama-v2-7b',
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Streaming Support and Limitations
+
+Fireworks provides streaming capabilities with some limitations:
+
+- **Full Streaming**: ⚠️ Limited support (model-dependent)
+- **Partial Streaming**: ⚠️ Limited support (may experience inconsistent behavior)
+- **Iterable Streaming**: ✅ Supported
+- **Async Support**: ✅ Supported
+
+### Partial Streaming Example
+
+```python
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+# Stream partial objects as they're generated
+for partial_user in client.stream_generate(
+    prompt="Create a user profile for Jason, age 25",
+    model='accounts/fireworks/models/llama-v2-7b',
+    response_model=User,
+):
+    print(f"Current state: {partial_user}")
+    # Fields will populate gradually as they're generated
+```
+
+**Important Notes on Streaming:**
+- Full streaming support varies by model and configuration
+- Partial streaming has limited support and may require additional error handling
+- Some models may not support streaming at all
+- Consider implementing fallback mechanisms for streaming scenarios
+- Test streaming capabilities with your specific model before deployment
+- Monitor streaming performance and implement appropriate error handling
+- For production use, implement non-streaming fallbacks
+
+### Model-Specific Streaming Support
+
+1. **Llama-2 Models**
+   - Basic streaming support
+   - May experience chunked responses
+   - Recommended for non-critical streaming use cases
+
+2. **Mistral Models**
+   - Limited streaming support
+   - Better suited for non-streaming operations
+   - Use with appropriate fallback mechanisms
+
+3. **Custom Models**
+   - Streaming capabilities vary
+   - Requires thorough testing
+   - May need model-specific optimizations
+
+## Iterable Example
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.generate_iterable(
+    prompt="""
+        Extract users:
+        1. Jason is 25 years old
+        2. Sarah is 30 years old
+        3. Mike is 28 years old
+    """,
+    model='accounts/fireworks/models/llama-v2-7b',
+    response_model=User,
+)
+
+for user in users:
+    print(user)  # Prints each user as it's extracted
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+# Use different modes for different scenarios
+client = instructor.patch(client, mode=Mode.JSON)  # JSON mode
+client = instructor.patch(client, mode=Mode.TOOLS)  # Tools mode
+client = instructor.patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Available Models
+
+Fireworks offers several model options:
+- Llama-2 (various sizes)
+- Mistral (various configurations)
+- Custom fine-tuned models
+- Enterprise deployments
+
+## Best Practices
+
+1. **Model Selection**
+   - Choose models with known streaming support
+   - Consider cost-performance ratio
+   - Monitor usage and costs
+   - Use appropriate context lengths
+
+2. **Optimization Tips**
+   - Implement proper caching
+   - Use non-streaming fallbacks
+   - Monitor token usage
+   - Use appropriate temperature settings
+
+3. **Error Handling**
+   - Implement streaming-specific error handling
+   - Handle rate limits
+   - Monitor model responses
+   - Use appropriate timeout settings
+
+## Common Use Cases
+
+- Enterprise Applications
+- Cost-Effective Processing
+- High-Performance Computing
+- Research Applications
+- Production Deployments
+
+## Related Resources
+
+- [Fireworks Documentation](https://docs.fireworks.ai/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Fireworks's latest API versions. Check the [changelog](../../CHANGELOG.md) for updates.
+
+Note: Always verify model-specific features and limitations before implementing streaming functionality in production environments.
diff --git a/docs/integrations/google.md b/docs/integrations/google.md
new file mode 100644
index 000000000..6df5d9a97
--- /dev/null
+++ b/docs/integrations/google.md
@@ -0,0 +1,246 @@
+---
+title: "Structured outputs with Google/Gemini, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Google's Gemini models. Learn how to generate structured, type-safe outputs with Google's advanced AI capabilities."
+---
+
+# Structured outputs with Google/Gemini, a complete guide w/ instructor
+
+Google's Gemini models provide powerful AI capabilities with multimodal support. This guide shows you how to use Instructor with Google's Gemini models for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Google support:
+
+```bash
+pip install "instructor[google]"
+```
+
+## Simple User Example (Sync)
+
+```python
+from google.generativeai import GenerativeModel
+import instructor
+from pydantic import BaseModel
+
+# Initialize the client
+model = GenerativeModel('gemini-pro')
+
+# Enable instructor patches
+client = instructor.from_google(model)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.generate_content(
+    prompt="Extract: Jason is 25 years old",
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+from google.generativeai import GenerativeModel
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize async client
+model = GenerativeModel('gemini-pro')
+
+# Enable instructor patches
+client = instructor.from_google(model)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.generate_content_async(
+        prompt="Extract: Jason is 25 years old",
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.generate_content(
+    prompt="""
+        Extract: Jason is 25 years old.
+        He lives at 123 Main St, New York, USA
+        and has a summer house at 456 Beach Rd, Miami, USA
+    """,
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Streaming Support and Limitations
+
+Google's Gemini models provide streaming capabilities with some limitations:
+
+- **Full Streaming**: ✅ Supported
+- **Partial Streaming**: ⚠️ Limited support (may experience inconsistent behavior)
+- **Iterable Streaming**: ✅ Supported
+- **Async Support**: ✅ Supported
+
+### Partial Streaming Example
+
+```python
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+# Stream partial objects as they're generated
+for partial_user in client.generate_content_stream(
+    prompt="Create a user profile for Jason, age 25",
+    response_model=User,
+):
+    print(f"Current state: {partial_user}")
+    # Fields will populate gradually as they're generated
+```
+
+**Important Notes on Streaming:**
+- Full streaming is well-supported for complete response generation
+- Partial streaming has limited support and may require additional error handling
+- Some responses may arrive in larger chunks rather than field-by-field
+- Consider implementing fallback mechanisms for partial streaming scenarios
+- Monitor streaming performance and implement appropriate error handling
+- Test thoroughly with your specific use case before deploying to production
+
+## Iterable Example
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.generate_content_iterable(
+    prompt="""
+        Extract users:
+        1. Jason is 25 years old
+        2. Sarah is 30 years old
+        3. Mike is 28 years old
+    """,
+    response_model=User,
+)
+
+for user in users:
+    print(user)  # Prints each user as it's extracted
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+# Use different modes for different scenarios
+client = instructor.patch(client, mode=Mode.JSON)  # JSON mode
+client = instructor.patch(client, mode=Mode.TOOLS)  # Tools mode
+client = instructor.patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Available Models
+
+Google offers several Gemini models:
+- Gemini Pro (General purpose)
+- Gemini Pro Vision (Multimodal)
+- Gemini Ultra (Coming soon)
+
+## Best Practices
+
+1. **Model Selection**
+   - Choose model based on task requirements
+   - Consider multimodal needs
+   - Monitor quota usage
+   - Use appropriate context lengths
+
+2. **Optimization Tips**
+   - Structure prompts effectively
+   - Use appropriate temperature settings
+   - Implement caching strategies
+   - Monitor API usage
+
+3. **Error Handling**
+   - Implement proper validation
+   - Handle quota limits gracefully
+   - Monitor model responses
+   - Use appropriate timeout settings
+
+## Common Use Cases
+
+- Data Extraction
+- Content Generation
+- Document Analysis
+- Multimodal Processing
+- Complex Reasoning Tasks
+
+## Related Resources
+
+- [Google AI Documentation](https://ai.google.dev/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Google's latest API versions. Check the [changelog](../../CHANGELOG.md) for updates.
diff --git a/docs/integrations/litellm.md b/docs/integrations/litellm.md
new file mode 100644
index 000000000..e15152d26
--- /dev/null
+++ b/docs/integrations/litellm.md
@@ -0,0 +1,287 @@
+---
+title: "Structured outputs with LiteLLM, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with LiteLLM's unified interface. Learn how to generate structured, type-safe outputs across multiple LLM providers."
+---
+
+# Structured outputs with LiteLLM, a complete guide w/ instructor
+
+LiteLLM provides a unified interface for multiple LLM providers, making it easy to switch between different models and providers. This guide shows you how to use Instructor with LiteLLM for type-safe, validated responses across various LLM providers.
+
+## Quick Start
+
+Install Instructor with LiteLLM support:
+
+```bash
+pip install "instructor[litellm]"
+```
+
+## Simple User Example (Sync)
+
+```python
+from litellm import completion
+import instructor
+from pydantic import BaseModel
+
+# Enable instructor patches
+client = instructor.from_litellm()
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.completion(
+    model="gpt-3.5-turbo",  # Can use any supported model
+    messages=[
+        {"role": "user", "content": "Extract: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+from litellm import acompletion
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Enable instructor patches for async
+client = instructor.from_litellm()
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.acompletion(
+        model="gpt-3.5-turbo",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.completion(
+    model="gpt-3.5-turbo",
+    messages=[
+        {"role": "user", "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """},
+    ],
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Streaming Support and Limitations
+
+LiteLLM's streaming capabilities vary by provider. Here's a comprehensive breakdown:
+
+### Provider-Specific Streaming Support
+
+| Provider | Full Streaming | Partial Streaming | Iterable Streaming | Async Support |
+|----------|---------------|-------------------|-------------------|---------------|
+| OpenAI   | ✅ Full       | ✅ Full           | ✅ Full           | ✅ Full       |
+| Anthropic| ✅ Full       | ✅ Full           | ✅ Full           | ✅ Full       |
+| Azure    | ✅ Full       | ✅ Full           | ✅ Full           | ✅ Full       |
+| Google   | ✅ Full       | ⚠️ Limited        | ✅ Full           | ✅ Full       |
+| Cohere   | ❌ None       | ❌ None           | ✅ Full           | ✅ Full       |
+| AWS      | ⚠️ Limited    | ⚠️ Limited        | ✅ Full           | ✅ Full       |
+| Mistral  | ❌ None       | ❌ None           | ✅ Full           | ✅ Full       |
+
+### Partial Streaming Example
+
+```python
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+# Stream partial objects as they're generated
+for partial_user in client.stream_completion(
+    model="gpt-3.5-turbo",  # Choose a provider with streaming support
+    messages=[
+        {"role": "user", "content": "Create a user profile for Jason, age 25"},
+    ],
+    response_model=User,
+):
+    print(f"Current state: {partial_user}")
+    # Fields will populate gradually as they're generated
+```
+
+**Important Notes on Streaming:**
+- Streaming capabilities depend entirely on the chosen provider
+- Some providers may not support streaming at all
+- Partial streaming behavior varies significantly between providers
+- Always implement fallback mechanisms for providers without streaming
+- Test streaming functionality with your specific provider before deployment
+- Consider implementing provider-specific error handling
+- Monitor streaming performance across different providers
+
+### Provider-Specific Considerations
+
+1. **OpenAI/Azure/Anthropic**
+   - Full streaming support
+   - Reliable partial streaming
+   - Consistent performance
+
+2. **Google/AWS**
+   - Limited partial streaming
+   - May require additional error handling
+   - Consider implementing fallbacks
+
+3. **Cohere/Mistral**
+   - No streaming support
+   - Use non-streaming alternatives
+   - Implement appropriate fallbacks
+
+## Iterable Example
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.completion_iterable(
+    model="gpt-3.5-turbo",
+    messages=[
+        {"role": "user", "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """},
+    ],
+    response_model=User,
+)
+
+for user in users:
+    print(user)  # Prints each user as it's extracted
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+# Use different modes for different scenarios
+client = instructor.patch(client, mode=Mode.JSON)  # JSON mode
+client = instructor.patch(client, mode=Mode.TOOLS)  # Tools mode
+client = instructor.patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Supported Providers
+
+LiteLLM supports multiple providers:
+- OpenAI
+- Anthropic
+- Azure
+- AWS Bedrock
+- Google Vertex AI
+- Cohere
+- Hugging Face
+- And many more
+
+## Best Practices
+
+1. **Provider Selection**
+   - Choose providers based on streaming requirements
+   - Consider cost and performance
+   - Monitor usage across providers
+   - Implement provider-specific fallback strategies
+
+2. **Optimization Tips**
+   - Use provider-specific features
+   - Implement proper caching
+   - Monitor costs across providers
+   - Handle provider-specific errors
+
+3. **Error Handling**
+   - Implement provider-specific handling
+   - Use proper fallback logic
+   - Monitor provider availability
+   - Handle rate limits properly
+
+## Common Use Cases
+
+- Multi-Provider Applications
+- Provider Fallback Systems
+- Cost Optimization
+- Cross-Provider Testing
+- Unified API Integration
+
+## Related Resources
+
+- [LiteLLM Documentation](https://docs.litellm.ai/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with LiteLLM's latest releases. Check the [changelog](../../CHANGELOG.md) for updates.
+
+Note: Always verify provider-specific features and limitations in their respective documentation before implementation.
diff --git a/docs/integrations/llama-cpp-python.md b/docs/integrations/llama-cpp-python.md
new file mode 100644
index 000000000..d96ead636
--- /dev/null
+++ b/docs/integrations/llama-cpp-python.md
@@ -0,0 +1,244 @@
+---
+title: "Structured outputs with llama-cpp-python, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with llama-cpp-python for local LLM deployment. Learn about performance considerations, limitations, and best practices for structured outputs."
+---
+
+# Structured outputs with llama-cpp-python, a complete guide w/ instructor
+
+llama-cpp-python provides Python bindings for llama.cpp, enabling local deployment of LLMs. This guide shows you how to use Instructor with llama-cpp-python for type-safe, validated responses while being aware of important performance considerations and limitations.
+
+## Important Limitations
+
+Before getting started, be aware of these critical limitations:
+
+### Performance Considerations
+- **CPU-Only Execution**: Currently runs on CPU only, which significantly impacts performance
+- **Long Inference Times**: Expect 30-60+ seconds for simple extractions on CPU
+- **Context Window Management**:
+  - Default context size is 2048 tokens (configurable)
+  - Larger contexts (>4096) may require more memory
+  - Adjust n_ctx based on your needs and available memory
+- **Memory Usage**: Requires ~4GB of RAM for model loading
+
+### Streaming Support
+- **Basic Streaming**: ✓ Supported and verified working
+- **Structured Output Streaming**: ✓ Supported with limitations
+  - Chunks are delivered in larger intervals compared to cloud providers
+  - Response time may be slower due to CPU-only processing
+  - Partial objects stream correctly but with higher latency
+- **Async Support**: ❌ Not supported (AsyncLlama is not available)
+
+## Quick Start
+
+Install Instructor with llama-cpp-python support:
+
+```bash
+pip install "instructor[llama-cpp-python]"
+```
+
+## Simple User Example (Sync)
+
+```python
+from llama_cpp import Llama
+from instructor import patch
+from pydantic import BaseModel
+
+# Initialize the model with appropriate settings
+llm = Llama(
+    model_path="path/to/your/gguf/model",
+    n_ctx=2048,  # Adjust based on your needs and memory constraints
+    n_batch=32  # Adjust for performance vs memory trade-off
+)
+
+# Enable instructor patches
+client = patch(llm)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.chat.create(
+    messages=[{"role": "user", "content": "Extract: Jason is 25 years old"}],
+    response_model=User,
+    max_tokens=100,
+    temperature=0.1
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.chat.create(
+    messages=[{
+        "role": "user",
+        "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """
+    }],
+    response_model=User,
+    max_tokens=200,
+    temperature=0.1
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Partial Streaming Example
+
+```python
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+# Stream partial objects as they're generated
+for partial_user in client.chat.create(
+    messages=[{"role": "user", "content": "Create a user profile for Jason, age 25"}],
+    response_model=User,
+    max_tokens=100,
+    temperature=0.1,
+    stream=True
+):
+    print(f"Current state: {partial_user}")
+    # Fields will populate gradually as they're generated
+```
+
+## Iterable Example
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.chat.create(
+    messages=[{
+        "role": "user",
+        "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """
+    }],
+    response_model=User,
+    max_tokens=100,
+    temperature=0.1
+)
+
+for user in users:
+    print(user)  # Prints each user as it's extracted
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import patch
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+# Use different modes for different scenarios
+client = patch(client, mode=Mode.JSON)  # JSON mode
+client = patch(client, mode=Mode.TOOLS)  # Tools mode
+client = patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Model Configuration and Performance Considerations
+
+### Hardware Requirements and Limitations
+- **CPU-Only Operation**: Currently, the implementation runs on CPU only
+- **Memory Usage**: Requires approximately 4GB RAM for model loading
+- **Processing Speed**: Expect significant processing times (30-60+ seconds) for simple extractions
+
+### Key Configuration Options
+- `n_ctx`: Context window size (default: 2048, limited compared to training context of 4096)
+- `n_batch`: Batch size for prompt processing (adjust for memory/performance trade-off)
+- `n_threads`: Number of CPU threads to use (optimize based on your hardware)
+
+## Best Practices
+
+1. **Resource Management**
+   - Monitor CPU usage and memory consumption
+   - Keep prompts concise due to context window limitations
+   - Implement appropriate timeouts for long-running operations
+   - Consider request queuing for multiple users
+
+2. **Model Selection**
+   - Use quantized models to reduce memory usage
+   - Balance model size vs performance needs
+   - Consider smaller models for faster inference
+   - Test with your specific use case
+
+3. **Performance Optimization**
+   - Batch similar requests when possible
+   - Implement caching strategies
+   - Use appropriate timeout values
+   - Monitor and log performance metrics
+
+## Common Use Cases
+
+- Local Development
+- Privacy-Sensitive Applications
+- Edge Computing
+- Offline Processing
+- Resource-Constrained Environments
+
+## Related Resources
+
+- [llama-cpp-python Documentation](https://llama-cpp-python.readthedocs.io/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with the latest llama-cpp-python releases. Check the [changelog](../../CHANGELOG.md) for updates.
diff --git a/docs/integrations/mistral.md b/docs/integrations/mistral.md
new file mode 100644
index 000000000..a516ce5e0
--- /dev/null
+++ b/docs/integrations/mistral.md
@@ -0,0 +1,234 @@
+---
+title: "Structured outputs with Mistral, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Mistral and Mixtral models. Learn how to generate structured, type-safe outputs with these powerful open-source models."
+---
+
+# Mistral & Mixtral Integration with Instructor
+
+Mistral AI's models, including Mistral and Mixtral, offer powerful open-source alternatives for structured output generation. This guide shows you how to leverage these models with Instructor for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Mistral support:
+
+```bash
+pip install "instructor[mistralai]"
+```
+
+## Simple User Example (Sync)
+
+```python
+from mistralai.client import MistralClient
+import instructor
+from pydantic import BaseModel
+
+# Enable instructor patches for Mistral client
+client = instructor.from_mistral(MistralClient(), mode=instructor.Mode.MISTRAL_TOOLS)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.chat.complete(
+    model="mistral-large-latest",  # or "mixtral-8x7b-instruct"
+    messages=[
+        {"role": "user", "content": "Extract: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+from mistralai.async_client import MistralAsyncClient
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Enable instructor patches for async Mistral client
+client = instructor.from_mistral(MistralAsyncClient(), mode=instructor.Mode.MISTRAL_TOOLS, use_async=True)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.chat.complete(
+        model="mistral-large-latest",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.chat.complete(
+    model="mixtral-8x7b-instruct",
+    messages=[
+        {"role": "user", "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """},
+    ],
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Streaming Support
+
+Mistral models have limited streaming support through Instructor. Here are the current capabilities and limitations:
+
+1. **Full Streaming**: Not currently supported
+2. **Partial Streaming**: Not currently supported
+3. **Iterable Streaming**: Limited support for multiple object extraction
+4. **Async Support**: Available for non-streaming operations
+
+### Streaming Limitations
+- Full streaming is not currently implemented
+- Partial streaming is not available
+- Iterable responses must be processed as complete responses
+- Use async client for better performance with large responses
+
+### Performance Considerations
+- Use batch processing for multiple extractions
+- Implement proper error handling
+- Consider response size limitations
+- Set appropriate timeouts for large responses
+
+## Iterable Example
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.chat.complete(
+    model="mixtral-8x7b-instruct",
+    messages=[
+        {"role": "user", "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """},
+    ],
+    response_model=User,
+)
+
+print(users)  # Prints complete response
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+client = instructor.from_mistral(client, validation_hook=validation_hook)
+```
+
+### Mode Selection
+
+```python
+from instructor import Mode
+
+# Use MISTRAL_TOOLS mode for best results
+client = instructor.from_mistral(client, mode=Mode.MISTRAL_TOOLS)
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.from_mistral(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Model Options
+
+Mistral AI provides several powerful models:
+- Mistral-7B
+- Mixtral-8x7B
+- Custom fine-tuned variants
+- Hosted API options
+
+## Best Practices
+
+1. **Model Selection**
+   - Use Mixtral-8x7B for complex tasks
+   - Mistral-7B for simpler extractions
+   - Consider latency requirements
+
+2. **Optimization Tips**
+   - Use async client for better performance
+   - Implement proper error handling
+   - Monitor token usage
+
+3. **Deployment Considerations**
+   - Self-hosted vs. API options
+   - Resource requirements
+   - Scaling strategies
+
+## Common Use Cases
+
+- Data Extraction
+- Content Structuring
+- API Response Formatting
+- Document Analysis
+- Configuration Generation
+
+## Related Resources
+
+- [Mistral AI Documentation](https://docs.mistral.ai/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with the latest Mistral AI releases. Check the [changelog](../../CHANGELOG.md) for updates.
diff --git a/docs/integrations/ollama.md b/docs/integrations/ollama.md
new file mode 100644
index 000000000..e50681550
--- /dev/null
+++ b/docs/integrations/ollama.md
@@ -0,0 +1,283 @@
+---
+title: "Structured outputs with Ollama, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Ollama for local LLM deployment. Learn how to generate structured, type-safe outputs with locally hosted models."
+---
+
+# Structured outputs with Ollama, a complete guide w/ instructor
+
+Ollama provides an easy way to run large language models locally. This guide shows you how to use Instructor with Ollama for type-safe, validated responses while maintaining complete control over your data and infrastructure.
+
+## Important Limitations
+
+Before getting started, please note these important limitations when using Instructor with Ollama:
+
+1. **No Function Calling/Tools Support**: Ollama does not support OpenAI's function calling or tools mode. You'll need to use JSON mode instead.
+2. **Limited Streaming Support**: Streaming features like `create_partial` are not available.
+3. **Mode Restrictions**: Only JSON mode is supported. Tools, MD_JSON, and other modes are not available.
+4. **Memory Requirements**: Different models have varying memory requirements:
+   - Llama 2 (default): Requires 8.4GB+ system memory
+   - Mistral-7B: Requires 4.5GB+ system memory
+   - For memory-constrained systems (< 8GB RAM), use quantized models like `mistral-7b-instruct-v0.2-q4`
+
+## Quick Start
+
+Install Instructor with OpenAI compatibility (Ollama uses OpenAI-compatible endpoints):
+
+```bash
+pip install "instructor[openai]"
+```
+
+Make sure you have Ollama installed and running locally. Visit [Ollama's installation guide](https://ollama.ai/download) for setup instructions.
+
+## Simple User Example (Sync)
+
+```python
+import openai
+import instructor
+from pydantic import BaseModel
+
+# Configure OpenAI client with Ollama endpoint
+client = openai.OpenAI(
+    base_url="http://localhost:11434/v1",
+    api_key="ollama"  # Ollama doesn't require an API key
+)
+
+# Enable instructor patches with JSON mode
+client = instructor.patch(client, mode=instructor.Mode.JSON)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.chat.completions.create(
+    model="mistral-7b-instruct-v0.2-q4",  # Recommended for memory-constrained systems
+    messages=[
+        {"role": "user", "content": "Extract: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+import openai
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Configure async OpenAI client with Ollama endpoint
+client = openai.AsyncOpenAI(
+    base_url="http://localhost:11434/v1",
+    api_key="ollama"
+)
+
+# Enable instructor patches with JSON mode
+client = instructor.patch(client, mode=instructor.Mode.JSON)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.chat.completions.create(
+        model="llama2",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.chat.completions.create(
+    model="llama2",
+    messages=[
+        {"role": "user", "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """},
+    ],
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Alternative to Streaming
+
+Since Ollama doesn't support streaming with `create_partial`, you can achieve similar results by breaking down your requests into smaller chunks:
+
+```python
+class User(BaseModel):
+    name: str
+    age: int
+    bio: Optional[str] = None
+
+# First, extract basic information
+user = client.chat.completions.create(
+    model="llama2",
+    messages=[
+        {"role": "user", "content": "Extract basic info: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+# Then, add additional information in separate requests
+user_with_bio = client.chat.completions.create(
+    model="llama2",
+    messages=[
+        {"role": "user", "content": f"Generate a short bio for {user.name}, who is {user.age} years old"},
+    ],
+    response_model=User,
+)
+```
+
+## Multiple Items Extraction
+
+Instead of using `create_iterable`, which relies on streaming, you can extract multiple items using a list:
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+class UserList(BaseModel):
+    users: List[User]
+
+# Extract multiple users from text
+response = client.chat.completions.create(
+    model="llama2",
+    messages=[
+        {"role": "user", "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """},
+    ],
+    response_model=UserList,
+)
+
+for user in response.users:
+    print(user)  # Prints each user
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Selection
+
+```python
+from instructor import Mode
+
+# Ollama only supports JSON mode
+client = instructor.patch(client, mode=Mode.JSON)
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Available Models
+
+Ollama supports various models:
+- Llama 2 (all variants)
+- CodeLlama
+- Mistral
+- Custom models
+- And many more via `ollama pull`
+
+## Best Practices
+
+1. **Model Selection**
+   - Choose model size based on hardware capabilities
+   - Consider memory constraints
+   - Balance speed and accuracy needs
+
+2. **Local Deployment**
+   - Monitor system resources
+   - Implement proper error handling
+   - Consider GPU acceleration
+
+3. **Performance Optimization**
+   - Use appropriate quantization
+   - Implement caching
+   - Monitor memory usage
+
+4. **Working with Limitations**
+   - Always use JSON mode
+   - Break down complex requests into smaller parts
+   - Implement your own batching for multiple items
+   - Use proper error handling for unsupported features
+
+## Common Use Cases
+
+- Local Data Processing
+- Offline Development
+- Privacy-Sensitive Applications
+- Rapid Prototyping
+- Edge Computing
+
+## Related Resources
+
+- [Ollama Documentation](https://ollama.ai/docs)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Ollama's latest releases. Check the [changelog](../../CHANGELOG.md) for updates.
+
+Note: Always verify model-specific features and limitations before implementation.
diff --git a/docs/integrations/openai.md b/docs/integrations/openai.md
new file mode 100644
index 000000000..78b0cdf2d
--- /dev/null
+++ b/docs/integrations/openai.md
@@ -0,0 +1,292 @@
+---
+title: "Structured outputs with OpenAI, a complete guide w/ instructor"
+description: "Learn how to use Instructor with OpenAI's models for type-safe, structured outputs. Complete guide with examples and best practices for GPT-4 and other OpenAI models."
+---
+
+# OpenAI Integration with Instructor
+
+OpenAI is the primary integration for Instructor, offering robust support for structured outputs with GPT-3.5, GPT-4, and future models. This guide covers everything you need to know about using OpenAI with Instructor for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with OpenAI support:
+
+```bash
+pip install "instructor[openai]"
+```
+
+⚠️ **Important**: You must set your OpenAI API key before using the client. You can do this in two ways:
+
+1. Set the environment variable:
+```bash
+export OPENAI_API_KEY='your-api-key-here'
+```
+
+2. Or provide it directly to the client:
+```python
+import os
+from openai import OpenAI
+client = OpenAI(api_key='your-api-key-here')
+```
+
+## Simple User Example (Sync)
+
+```python
+import os
+from openai import OpenAI
+import instructor
+from pydantic import BaseModel
+
+# Initialize with API key
+client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
+
+# Enable instructor patches for OpenAI client
+client = instructor.from_openai(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.chat.completions.create(
+    model="gpt-4-turbo-preview",
+    messages=[
+        {"role": "user", "content": "Extract: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+import os
+from openai import AsyncOpenAI
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize with API key
+client = AsyncOpenAI(api_key=os.getenv('OPENAI_API_KEY'))
+
+# Enable instructor patches for async OpenAI client
+client = instructor.from_openai(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.chat.completions.create(
+        model="gpt-4-turbo-preview",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.chat.completions.create(
+    model="gpt-4-turbo-preview",
+    messages=[
+        {"role": "user", "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """},
+    ],
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Streaming Support
+
+OpenAI provides comprehensive streaming support through multiple methods, but proper setup and error handling are essential:
+
+### Prerequisites
+- Valid OpenAI API key must be set
+- Appropriate model access (GPT-4, GPT-3.5-turbo)
+- Proper error handling implementation
+
+### Available Streaming Methods
+
+1. **Full Streaming**: ✅ Available through standard streaming mode
+2. **Partial Streaming**: ✅ Supports field-by-field streaming
+3. **Iterable Streaming**: ✅ Enables streaming of multiple objects
+4. **Async Streaming**: ✅ Full async/await support
+
+### Error Handling for Streaming
+
+```python
+from openai import OpenAIError
+import os
+
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+try:
+    # Stream partial objects as they're generated
+    for partial_user in client.chat.completions.create_partial(
+        model="gpt-4-turbo-preview",
+        messages=[
+            {"role": "user", "content": "Create a user profile for Jason, age 25"},
+        ],
+        response_model=User,
+    ):
+        print(f"Current state: {partial_user}")
+except OpenAIError as e:
+    if "api_key" in str(e).lower():
+        print("Error: Invalid or missing API key. Please check your OPENAI_API_KEY environment variable.")
+    else:
+        print(f"OpenAI API error: {str(e)}")
+except Exception as e:
+    print(f"Unexpected error: {str(e)}")
+```
+
+### Iterable Example with Error Handling
+
+```python
+from typing import List
+from openai import OpenAIError
+
+class User(BaseModel):
+    name: str
+    age: int
+
+try:
+    # Extract multiple users from text
+    users = client.chat.completions.create_iterable(
+        model="gpt-4-turbo-preview",
+        messages=[
+            {"role": "user", "content": """
+                Extract users:
+                1. Jason is 25 years old
+                2. Sarah is 30 years old
+                3. Mike is 28 years old
+            """},
+        ],
+        response_model=User,
+    )
+
+    for user in users:
+        print(user)  # Prints each user as it's extracted
+except OpenAIError as e:
+    print(f"OpenAI API error: {str(e)}")
+    if "api_key" in str(e).lower():
+        print("Please ensure your API key is set correctly.")
+except Exception as e:
+    print(f"Unexpected error: {str(e)}")
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+# Use different modes for different scenarios
+client = instructor.patch(client, mode=Mode.JSON)  # JSON mode
+client = instructor.patch(client, mode=Mode.TOOLS)  # Tools mode
+client = instructor.patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Best Practices
+
+1. **Model Selection**
+   - Use GPT-4 for complex structured outputs
+   - GPT-3.5-turbo for simpler schemas
+   - Always specify temperature=0 for consistent outputs
+
+2. **Error Handling**
+   - Implement proper validation
+   - Use try-except blocks for graceful failure
+   - Monitor validation retries
+
+3. **Performance Optimization**
+   - Use streaming for large responses
+   - Implement caching where appropriate
+   - Batch requests when possible
+
+## Common Use Cases
+
+- Data Extraction
+- Form Parsing
+- API Response Structuring
+- Document Analysis
+- Configuration Generation
+
+## Related Resources
+
+- [OpenAI Documentation](https://platform.openai.com/docs)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with the latest OpenAI API versions and models. Check the [changelog](../../CHANGELOG.md) for updates.
+
+### Environment Setup
+
+For production use, we recommend:
+1. Using environment variables for API keys
+2. Implementing proper error handling
+3. Setting up monitoring for API usage
+4. Regular updates of both OpenAI and Instructor packages
diff --git a/docs/integrations/vertex.md b/docs/integrations/vertex.md
new file mode 100644
index 000000000..f150cd2f8
--- /dev/null
+++ b/docs/integrations/vertex.md
@@ -0,0 +1,226 @@
+---
+title: "Vertex AI Integration with Instructor | Structured Output Guide"
+description: "Complete guide to using Instructor with Google Cloud's Vertex AI. Learn how to generate structured, type-safe outputs with enterprise-grade AI capabilities."
+---
+
+# Vertex AI Integration with Instructor
+
+Google Cloud's Vertex AI provides enterprise-grade AI capabilities with robust scaling and security features. This guide shows you how to use Instructor with Vertex AI for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Vertex AI support:
+
+```bash
+pip install "instructor[vertex]"
+```
+
+You'll also need the Google Cloud SDK and proper authentication:
+
+```bash
+pip install google-cloud-aiplatform
+```
+
+## Simple User Example (Sync)
+
+```python
+from vertexai.language_models import TextGenerationModel
+import instructor
+from pydantic import BaseModel
+
+# Initialize the model
+model = TextGenerationModel.from_pretrained("text-bison@001")
+
+# Enable instructor patches
+client = instructor.from_vertex(model)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.predict(
+    prompt="Extract: Jason is 25 years old",
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+from vertexai.language_models import TextGenerationModel
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize the model
+model = TextGenerationModel.from_pretrained("text-bison@001")
+
+# Enable instructor patches
+client = instructor.from_vertex(model)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.predict_async(
+        prompt="Extract: Jason is 25 years old",
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Create structured output with nested objects
+user = client.predict(
+    prompt="""
+        Extract: Jason is 25 years old.
+        He lives at 123 Main St, New York, USA
+        and has a summer house at 456 Beach Rd, Miami, USA
+    """,
+    response_model=User,
+)
+
+print(user)  # User with nested Address objects
+```
+
+## Partial Streaming Example
+
+Note: Vertex AI's current API does not support partial streaming of responses. The streaming functionality returns complete responses in chunks rather than partial objects. We recommend using the standard synchronous or asynchronous methods for structured output generation.
+
+## Iterable Example
+
+```python
+from typing import List
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.predict_iterable(
+    prompt="""
+        Extract users:
+        1. Jason is 25 years old
+        2. Sarah is 30 years old
+        3. Mike is 28 years old
+    """,
+    response_model=User,
+)
+
+for user in users:
+    print(user)  # Prints each user as it's extracted
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+### Mode Hooks
+
+```python
+from instructor import Mode
+
+# Use different modes for different scenarios
+client = instructor.patch(client, mode=Mode.JSON)  # JSON mode
+client = instructor.patch(client, mode=Mode.TOOLS)  # Tools mode
+client = instructor.patch(client, mode=Mode.MD_JSON)  # Markdown JSON mode
+```
+
+### Custom Retrying
+
+```python
+from instructor import RetryConfig
+
+client = instructor.patch(
+    client,
+    retry_config=RetryConfig(
+        max_retries=3,
+        on_retry=lambda *args: print("Retrying..."),
+    )
+)
+```
+
+## Available Models
+
+Vertex AI offers several model options:
+- PaLM 2 for Text (text-bison)
+- PaLM 2 for Chat (chat-bison)
+- Codey for Code Generation
+- Enterprise-specific models
+- Custom-trained models
+
+## Best Practices
+
+1. **Model Selection**
+   - Choose model based on enterprise requirements
+   - Consider security and compliance needs
+   - Monitor quota and costs
+   - Use appropriate model versions
+
+2. **Optimization Tips**
+   - Structure prompts effectively
+   - Use appropriate temperature settings
+   - Implement caching strategies
+   - Monitor API usage
+
+3. **Error Handling**
+   - Implement proper validation
+   - Handle quota limits gracefully
+   - Monitor model responses
+   - Use appropriate timeout settings
+
+## Common Use Cases
+
+- Enterprise Data Processing
+- Secure Content Generation
+- Document Analysis
+- Compliance-Aware Processing
+- Large-Scale Deployments
+
+## Related Resources
+
+- [Vertex AI Documentation](https://cloud.google.com/vertex-ai/docs)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Vertex AI's latest API versions. Check the [changelog](../../CHANGELOG.md) for updates.
+
+Note: Some features like partial streaming may not be available due to API limitations. Always check the latest documentation for feature availability.
diff --git a/mkdocs.yml b/mkdocs.yml
index 38612fd5b..7b3dab981 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -190,14 +190,6 @@ nav:
     - Templating: 'concepts/templating.md'
   - Hub:
     - Introduction to Instructor Hub: 'hub/index.md'
-    - Structured Outputs with Vertex AI: 'hub/vertexai.md'
-    - Structured Outputs with Ollama: 'hub/ollama.md'
-    - Structured Outputs with llama-cpp-python: 'hub/llama-cpp-python.md'
-    - Structured Outputs with Together: 'hub/together.md'
-    - Structured Outputs with Anyscale: 'hub/anyscale.md'
-    - Structured Outputs with Groq: 'hub/groq.md'
-    - Structured Outputs with Mistral: 'hub/mistral.md'
-    - Structured Outputs with Cohere: 'hub/cohere.md'
     - Classification with Structured Outputs: 'hub/single_classification.md'
     - Bulk Classification with Structured Outputs: 'hub/multiple_classification.md'
     - Extracting Tables with Structured Outputs: 'hub/tables_from_vision.md'
@@ -209,6 +201,21 @@ nav:
     - Generating Knowledge Graphs with Structured Outputs: 'hub/knowledge_graph.md'
     - Extracting Relevant Clips from YouTube Videos: "hub/youtube_clips.md"
     - Building Knowledge Graphs with Structured Outputs: 'tutorials/5-knowledge-graphs.ipynb'
+  - Integrations:
+    - Anyscale: 'integrations/anyscale.md'
+    - Anthropic: 'integrations/anthropic.md'
+    - Cerebras: 'integrations/cerebras.md'
+    - Cohere: 'integrations/cohere.md'
+    - Fireworks: 'integrations/fireworks.md'
+    - Google: 'integrations/google.md'
+    - Groq: 'integrations/groq.md'
+    - LiteLLM: 'integrations/litellm.md'
+    - llama-cpp-python: 'integrations/llama-cpp-python.md'
+    - Mistral: 'integrations/mistral.md'
+    - Ollama: 'integrations/ollama.md'
+    - OpenAI: 'integrations/openai.md'
+    - Together: 'integrations/together.md'
+    - Vertex AI: 'integrations/vertexai.md'
   - CLI Reference:
       - "CLI Reference": "cli/index.md"
       - "Finetuning GPT-3.5": "cli/finetune.md"
@@ -286,12 +293,26 @@ plugins:
   - redirects:
       redirect_maps:
          jobs.md: https://jobs.applied-llms.org/
+         'hub/clients/vertexai.md': 'integrations/vertexai.md'
+         'hub/clients/ollama.md': 'integrations/ollama.md'
+         'hub/clients/openai.md': 'integrations/openai.md'
+         'hub/clients/anthropic.md': 'integrations/anthropic.md'
+         'hub/clients/anyscale.md': 'integrations/anyscale.md'
+         'hub/clients/cohere.md': 'integrations/cohere.md'
+         'hub/clients/fireworks.md': 'integrations/fireworks.md'
+         'hub/clients/google.md': 'integrations/google.md'
+         'hub/clients/litellm.md': 'integrations/litellm.md'
+         'hub/clients/llama-cpp-python.md': 'integrations/llama-cpp-python.md'
+         'hub/clients/mistral.md': 'integrations/mistral.md'
+         'hub/clients/cerebras.md': 'integrations/cerebras.md'
+         'hub/clients/groq.md': 'integrations/groq.md'
+         'hub/clients/together.md': 'integrations/together.md'
   - mkdocs-jupyter:
       ignore_h1_titles: true
       execute: false
   - social
   - search:
-      separator: '[\s\u200b\-_,:!=\[\]()"`/]+|\.(?!\d)|&[lg]t;|(?!\b)(?=[A-Z][a-z])'
+      separator: '[\s\u200b\-_,:!=\[\]()"`/]+|\.(?!\b)(?=[A-Z][a-z])'
   - minify:
       minify_html: true
   - mkdocstrings: