diff --git a/docs/blog/index.md b/docs/blog/index.md
index 690f6b310..e0e7a2b3e 100644
--- a/docs/blog/index.md
+++ b/docs/blog/index.md
@@ -47,14 +47,13 @@ If you want to get updates on new features and tips on how to use Instructor, yo
 
 ## Integrations and Tools
 
-- [Ollama Integration](../hub/ollama.md)
-- [llama-cpp-python Integration](../hub/llama-cpp-python.md)
-- [Anyscale Integration](../hub/anyscale.md)
-- [Together Compute Integration](../hub/together.md)
+- [Ollama Integration](../integrations/ollama.md)
+- [llama-cpp-python Integration](../integrations/llama-cpp-python.md)
+- [Together Compute Integration](../integrations/together.md)
 - [Extracting Data into Pandas DataFrame using GPT-3.5 Turbo](../hub/pandas_df.md)
 - [Implementing Streaming Partial Responses with Field-Level Streaming](../hub/partial_streaming.md)
 
 ## Media and Resources
 
 - [Course: Structured Outputs with Instructor](https://www.wandb.courses/courses/steering-language-models?x=1)
-- [Keynote: Pydantic is All You Need](posts/aisummit-2023.md)
\ No newline at end of file
+- [Keynote: Pydantic is All You Need](posts/aisummit-2023.md)
diff --git a/docs/blog/posts/best_framework.md b/docs/blog/posts/best_framework.md
index 5fd6a23c9..d2e34b504 100644
--- a/docs/blog/posts/best_framework.md
+++ b/docs/blog/posts/best_framework.md
@@ -32,7 +32,7 @@ from pydantic import BaseModel
 import instructor
 
 class User(BaseModel):
-    name: str 
+    name: str
     age: int
 
 client = instructor.from_openai(openai.OpenAI())
@@ -42,7 +42,7 @@ user = client.chat.completions.create(
     response_model=User, # (1)!
     messages=[
         {
-            "role": "user", 
+            "role": "user",
             "content": "Extract the user's name and age from this: John is 25 years old"
         }
     ]
@@ -63,14 +63,14 @@ Other features on instructor, in and out of the llibrary are:
 2. Ability to use [Pydantic's validation context](../../concepts/reask_validation.md)
 3. [Parallel Tool Calling](../../concepts/parallel.md) with correct types
 4. Streaming [Partial](../../concepts/partial.md) and [Iterable](../../concepts/iterable.md) data.
-5. Returning [Primitive](../../concepts/types.md) Types and [Unions](../../concepts/unions.md) as well! 
-6. Lots, and Lots of [Cookbooks](../../examples/index.md), [Tutorials](../../tutorials/1-introduction.ipynb), Documentation and even [instructor hub](../../hub/index.md)
+5. Returning [Primitive](../../concepts/types.md) Types and [Unions](../../concepts/unions.md) as well!
+6. Lots, and Lots of [Cookbooks](../../examples/index.md), [Tutorials](../../tutorials/1-introduction.ipynb), Documentation and even [instructor hub](../../integrations/index.md)
 
 ## Instructor's Broad Applicability
 
 One of the key strengths of Instructor is that it's designed as a lightweight patch over the official OpenAI Python SDK. This means it can be easily integrated not just with OpenAI's hosted API service, but with any provider or platform that exposes an interface compatible with the OpenAI SDK.
 
-For example, providers like [Anyscale](../../hub/anyscale.md), [Together](../../hub/together.md), [Ollama](../../hub/ollama.md), [Groq](../../hub/groq.md), and [llama-cpp-python](../../hub/llama-cpp-python.md) all either use or mimic the OpenAI Python SDK under the hood. With Instructor's zero-overhead patching approach, teams can immediately start deriving structured data outputs from any of these providers. There's no need for custom integration work.
+For example, providers like [Together](../../integrations/together.md), [Ollama](../../integrations/ollama.md), [Groq](../../integrations/groq.md), and [llama-cpp-python](../../integrations/llama-cpp-python.md) all either use or mimic the OpenAI Python SDK under the hood. With Instructor's zero-overhead patching approach, teams can immediately start deriving structured data outputs from any of these providers. There's no need for custom integration work.
 
 ## Direct access to the messages array
 
@@ -84,4 +84,4 @@ This incremental, zero-overhead adoption path makes Instructor perfect for sprin
 
 And if you decide Instructor isn't a good fit after all, removing it is as simple as not applying the patch! The familiarity and flexibility of working directly with the OpenAI SDK is a core strength.
 
-Instructor solves the "string hellll" of unstructured LLM outputs. It allows teams to easily realize the full potential of tools like GPTs by mapping their text to type-safe, validated data structures. If you're looking to get more structured value out of LLMs, give Instructor a try!
\ No newline at end of file
+Instructor solves the "string hellll" of unstructured LLM outputs. It allows teams to easily realize the full potential of tools like GPTs by mapping their text to type-safe, validated data structures. If you're looking to get more structured value out of LLMs, give Instructor a try!
diff --git a/docs/blog/posts/introducing-structured-outputs.md b/docs/blog/posts/introducing-structured-outputs.md
index a66737adf..b6f06ef42 100644
--- a/docs/blog/posts/introducing-structured-outputs.md
+++ b/docs/blog/posts/introducing-structured-outputs.md
@@ -41,7 +41,7 @@ In this article, we'll show how `instructor` addresses many of these challenges
 
 ### Limited Validation and Retry Logic
 
-Validation is crucial for building reliable and effective applications. We want to catch errors in real time using `Pydantic` [validators](/concepts/reask_validation/) in order to allow our LLM to correct its responses on the fly.
+Validation is crucial for building reliable and effective applications. We want to catch errors in real time using `Pydantic` [validators](../../concepts/reask_validation.md) in order to allow our LLM to correct its responses on the fly.
 
 Let's see an example of a simple validator below which ensures user names are always in uppercase.
 
@@ -192,12 +192,13 @@ This built-in retry logic allows for targetted correction to the generated respo
 
 ### Real-time Streaming Validation
 
-A common use-case is to define a single schema and extract multiple instances of it. With `instructor`, doing this is relatively straightforward by using [our `create_iterable` method](/concepts/lists/).
+A common use-case is to define a single schema and extract multiple instances of it. With `instructor`, doing this is relatively straightforward by using [our `create_iterable` method](../../concepts/lists.md).
 
 ```python
 import instructor
 import openai
 from pydantic import BaseModel
+```
 
 client = instructor.from_openai(openai.OpenAI(), mode=instructor.Mode.TOOLS_STRICT)
 
@@ -228,7 +229,7 @@ for user in users:
     #> name='John' age=10
 ```
 
-Other times, we might also want to stream out information as it's dynamically generated into some sort of frontend component With `instructor`, you'll be able to do just that [using the `create_partial` method](/concepts/partial/).
+Other times, we might also want to stream out information as it's dynamically generated into some sort of frontend component With `instructor`, you'll be able to do just that [using the `create_partial` method](../../concepts/partial.md).
 
 ```python
 import instructor
@@ -375,4 +376,4 @@ While OpenAI's Structured Outputs shows promise, it has key limitations. The sys
 
 If you're interested in Structured Outputs, `instructor` addresses these critical issues. It provides automatic retries, real-time input validation, and multi-provider integration, allowing developers to more effectively implement Structured Outputs in their AI projects.
 
-if you haven't given `instructor` a shot, try it today!
\ No newline at end of file
+if you haven't given `instructor` a shot, try it today!
diff --git a/docs/blog/posts/open_source.md b/docs/blog/posts/open_source.md
index 51c8f9ce5..ec5fee1db 100644
--- a/docs/blog/posts/open_source.md
+++ b/docs/blog/posts/open_source.md
@@ -17,11 +17,11 @@ tags:
 - API Integration
 ---
 
-# Structured Output for Open Source and Local LLMs 
+# Structured Output for Open Source and Local LLMs
 
 Instructor has expanded its capabilities for language models. It started with API interactions via the OpenAI SDK, using [Pydantic](https://pydantic-docs.helpmanual.io/) for structured data validation. Now, Instructor supports multiple models and platforms.
 
-The integration of [JSON mode](../../concepts/patching.md#json-mode) improved adaptability to vision models and open source alternatives. This allows support for models from [GPT](https://openai.com/api/) and [Mistral](https://mistral.ai) to models on [Ollama](https://ollama.ai) and [Hugging Face](https://huggingface.co/models), using [llama-cpp-python](../../hub/llama-cpp-python.md).
+The integration of [JSON mode](../../concepts/patching.md#json-mode) improved adaptability to vision models and open source alternatives. This allows support for models from [GPT](https://openai.com/api/) and [Mistral](https://mistral.ai) to models on [Ollama](https://ollama.ai) and [Hugging Face](https://huggingface.co/models), using [llama-cpp-python](../../integrations/llama-cpp-python.md).
 
 Instructor now works with cloud-based APIs and local models for structured data extraction. Developers can refer to our guide on [Patching](../../concepts/patching.md) for information on using JSON mode with different models.
 
@@ -40,7 +40,7 @@ OpenAI clients offer functionalities for different needs. We explore clients int
 
 ### Ollama: A New Frontier for Local Models
 
-Ollama enables structured outputs with local models using JSON schema. See our [Ollama documentation](../../hub/ollama.md) for details.
+Ollama enables structured outputs with local models using JSON schema. See our [Ollama documentation](../../integrations/ollama.md) for details.
 
 For setup and features, refer to the documentation. The [Ollama website](https://ollama.ai/download) provides resources, models, and support.
 
@@ -68,6 +68,7 @@ client = instructor.from_openai(
     mode=instructor.Mode.JSON,
 )
 
+
 user = client.chat.completions.create(
     model="llama2",
     messages=[
@@ -93,7 +94,6 @@ Example of using llama-cpp-python for structured outputs:
 ```python
 import llama_cpp
 import instructor
-
 from llama_cpp.llama_speculative import LlamaPromptLookupDecoding
 from pydantic import BaseModel
 
@@ -111,9 +111,10 @@ llama = llama_cpp.Llama(
 
 create = instructor.patch(
     create=llama.create_chat_completion_openai_v1,
-    mode=instructor.Mode.JSON_SCHEMA, 
+    mode=instructor.Mode.JSON_SCHEMA,
 )
 
+
 class UserDetail(BaseModel):
     name: str
     age: int
@@ -131,56 +132,13 @@ user = create(
 
 print(user)
 #> name='Jason' age=30
-"""
 ```
 
 ## Alternative Providers
 
-### Anyscale
-
-Anyscale's Mistral model, as detailed in our [Anyscale documentation](../../hub/anyscale.md) and on [Anyscale's official documentation](https://docs.anyscale.com/), introduces the ability to obtain structured outputs using JSON schema.
-
-```bash
-export ANYSCALE_API_KEY="your-api-key"
-```
-
-```python
-import os
-from openai import OpenAI
-from pydantic import BaseModel
-import instructor
-
-
-class UserDetails(BaseModel):
-    name: str
-    age: int
-
-
-# enables `response_model` in create call
-client = instructor.from_openai(
-    OpenAI(
-        base_url="https://api.endpoints.anyscale.com/v1",
-        api_key=os.environ["ANYSCALE_API_KEY"],
-    ),
-    # This uses Anyscale's json schema output mode
-    mode=instructor.Mode.JSON_SCHEMA,
-)
-
-resp = client.chat.completions.create(
-    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
-    messages=[
-        {"role": "system", "content": "You are a world class extractor"},
-        {"role": "user", "content": 'Extract the following entities: "Jason is 20"'},
-    ],
-    response_model=UserDetails,
-)
-print(resp)
-#> name='Jason' age=20
-```
-
 ### Groq
 
-Groq's platform, detailed further in our [Groq documentation](../../hub/groq.md) and on [Groq's official documentation](https://groq.com/), offers a unique approach to processing with its tensor architecture. This innovation significantly enhances the performance of structured output processing.
+Groq's platform, detailed further in our [Groq documentation](../../integrations/groq.md) and on [Groq's official documentation](https://groq.com/), offers a unique approach to processing with its tensor architecture. This innovation significantly enhances the performance of structured output processing.
 
 ```bash
 export GROQ_API_KEY="your-api-key"
@@ -188,15 +146,18 @@ export GROQ_API_KEY="your-api-key"
 
 ```python
 import os
-import instructor
-import groq
 from pydantic import BaseModel
 
-client = qrog.Groq(
+import groq
+import instructor
+
+
+client = groq.Groq(
     api_key=os.environ.get("GROQ_API_KEY"),
 )
 
-# By default, the patch function will patch the ChatCompletion.create and ChatCompletion.create methods to support the response_model parameter
+# By default, the patch function will patch the ChatCompletion.create and ChatCompletion.create methods
+# to support the response_model parameter
 client = instructor.from_openai(client, mode=instructor.Mode.MD_JSON)
 
 
@@ -216,14 +177,14 @@ user: UserExtract = client.chat.completions.create(
 )
 
 assert isinstance(user, UserExtract), "Should be instance of UserExtract"
+
 print(user)
 #> name='jason' age=25
-"""
 ```
 
 ### Together AI
 
-Together AI, when combined with Instructor, offers a seamless experience for developers looking to leverage structured outputs in their applications. For more details, refer to our [Together AI documentation](../hub/together.md) and explore the [patching guide](../concepts/patching.md) to enhance your applications.
+Together AI, when combined with Instructor, offers a seamless experience for developers looking to leverage structured outputs in their applications. For more details, refer to our [Together AI documentation](../../integrations/together.md) and explore the [patching guide](../../concepts/patching.md) to enhance your applications.
 
 ```bash
 export TOGETHER_API_KEY="your-api-key"
@@ -231,9 +192,11 @@ export TOGETHER_API_KEY="your-api-key"
 
 ```python
 import os
-import openai
 from pydantic import BaseModel
+
 import instructor
+import openai
+
 
 client = openai.OpenAI(
     base_url="https://api.together.xyz/v1",
@@ -242,6 +205,7 @@ client = openai.OpenAI(
 
 client = instructor.from_openai(client, mode=instructor.Mode.TOOLS)
 
+
 class UserExtract(BaseModel):
     name: str
     age: int
@@ -256,29 +220,33 @@ user: UserExtract = client.chat.completions.create(
 )
 
 assert isinstance(user, UserExtract), "Should be instance of UserExtract"
-print(user)
 
+print(user)
 #> name='jason' age=25
 ```
 
 ### Mistral
 
-For those interested in exploring the capabilities of Mistral Large with Instructor, we highly recommend checking out our comprehensive guide on [Mistral Large](../../hub/mistral.md).
+For those interested in exploring the capabilities of Mistral Large with Instructor, we highly recommend checking out our comprehensive guide on [Mistral Large](../../integrations/mistral.md).
 
 ```python
 import instructor
-
 from pydantic import BaseModel
 from mistralai.client import MistralClient
 
+
 client = MistralClient()
 
-patched_chat = instructor.from_openai(create=client.chat, mode=instructor.Mode.MISTRAL_TOOLS)
+patched_chat = instructor.from_openai(
+    create=client.chat, mode=instructor.Mode.MISTRAL_TOOLS
+)
+
 
 class UserDetails(BaseModel):
     name: str
     age: int
 
+
 resp = patched_chat(
     model="mistral-large-latest",
     response_model=UserDetails,
@@ -289,6 +257,7 @@ resp = patched_chat(
         },
     ],
 )
+
 print(resp)
 #> name='Jason' age=20
-```
\ No newline at end of file
+```
diff --git a/docs/blog/posts/pairwise-llm-judge.md b/docs/blog/posts/pairwise-llm-judge.md
index e6efad646..7be2dbf43 100644
--- a/docs/blog/posts/pairwise-llm-judge.md
+++ b/docs/blog/posts/pairwise-llm-judge.md
@@ -64,7 +64,7 @@ Next, we'll create a function that uses our LLM to judge the relevance between a
 ```python
 def judge_relevance(question: str, text: str) -> Judgment:
     return client.chat.create(
-        model="gpt-4o-mini",
+        model="gpt-4",
         messages=[
             {
                 "role": "system",
@@ -102,8 +102,7 @@ def judge_relevance(question: str, text: str) -> Judgment:
                     {{text}}
                     </text>
                 """
-            },
-            },
+            }
         ],
         response_model=Judgment,
         context={"question": question, "text": text},
@@ -134,7 +133,7 @@ if __name__ == "__main__":
             score += 1
 
     print(f"Score: {score}/{len(test_pairs)}")
-    # > Score 9/10
+    #> Score 9/10
 ```
 
 This test loop runs the judge on each pair and compares the result to a predetermined similarity value, calculating an overall score.
diff --git a/docs/blog/posts/pydantic-is-still-all-you-need.md b/docs/blog/posts/pydantic-is-still-all-you-need.md
index 3c2b397f8..3e2588a9d 100644
--- a/docs/blog/posts/pydantic-is-still-all-you-need.md
+++ b/docs/blog/posts/pydantic-is-still-all-you-need.md
@@ -39,7 +39,7 @@ Pydantic, combined with function calling, offers a superior alternative for stru
 - Validators to improve system reliability
 - Cleaner, more maintainable code
 
-For more details on how Pydantic enhances data validation, check out our [Data Validation with Pydantic](../concepts/models.md) guide.
+For more details on how Pydantic enhances data validation, check out our [Data Validation with Pydantic](../../concepts/models.md) guide.
 
 And here's the kicker: nothing's really changed in the past year. The core API is still just:
 
@@ -63,7 +63,7 @@ Since last year:
 - Built a version in Rust
 - Seen 40% month-over-month growth in the Python library
 
-We now support [Ollama](../../hub/ollama.md), [llama-cpp-python](../../hub/llama-cpp-python.md), [Anthropic](../../hub/anthropic.md), [Cohere](../../hub/cohere.md), [Google](../../hub/google.md), [Vertex AI](../../hub/vertexai.md), and more. As long as language models support function calling capabilities, this API will remain standard.
+We now support [Ollama](../../integrations/ollama.md), [llama-cpp-python](../../integrations/llama-cpp-python.md), [Anthropic](../../integrations/anthropic.md), [Cohere](../../integrations/cohere.md), [Google](../../integrations/google.md), [Vertex AI](../../integrations/vertex.md), and more. As long as language models support function calling capabilities, this API will remain standard.
 
 ## Key Features
 
@@ -123,4 +123,4 @@ Pydantic is still all you need for effective structured outputs with LLMs. It's
 
 As we continue to refine AI language models, keeping these principles in mind will lead to more robust, maintainable, and powerful applications. The future of AI isn't just about what the models can do, but how seamlessly we can integrate them into our existing software ecosystems.
 
-For more advanced use cases and integrations, check out our [examples](../../examples/index.md) section, which covers various LLM providers and specialized implementations.
\ No newline at end of file
+For more advanced use cases and integrations, check out our [examples](../../examples/index.md) section, which covers various LLM providers and specialized implementations.
diff --git a/docs/blog/posts/rag-timelines.md b/docs/blog/posts/rag-timelines.md
index 71ff81b96..ba8eb0539 100644
--- a/docs/blog/posts/rag-timelines.md
+++ b/docs/blog/posts/rag-timelines.md
@@ -61,21 +61,23 @@ response = client.chat.completions.create(
     response_model=SearchQuery,
     messages=[
         {
-            "role": "system", 
-            "content": "You are a query generator for customer support tickets. The current date is 2024-02-17"},
+            "role": "system",
+            "content": "You are a query generator for customer support tickets. The current date is 2024-02-17",
+        },
         {
-            "role": "user", 
-            "content": "Show me customer support tickets opened in the past week."
+            "role": "user",
+            "content": "Show me customer support tickets opened in the past week.",
         },
     ],
 )
 
+# Example response:
 {
     "query": "Show me customer support tickets opened in the past week.",
     "time_filter": {
         "start_date": "2024-02-10T00:00:00",
-        "end_date": "2024-02-17T00:00:00"
-    }
+        "end_date": "2024-02-17T00:00:00",
+    },
 }
 ```
 
@@ -85,7 +87,7 @@ When working with time-based queries, it's important to consider the nuances of
 
 To handle this, you'll want to design your `TimeFilter` model to intelligently reason about these relative time periods. This could involve:
 
-- Defaulting to the user's local timezone if available, or using a consistent default like UTC  
+- Defaulting to the user's local timezone if available, or using a consistent default like UTC
 - Defining clear rules for how to calculate the start and end of relative periods like "week" or "month"
   - e.g. does "past week" mean the last 7 days or the previous Sunday-Saturday range?
 - Allowing for flexibility in how users specify dates (exact datetimes, just dates, natural language phrases)
@@ -97,4 +99,4 @@ By building this logic into the `TimeFilter` model, you can abstract away the co
 
 Of course, there may be edge cases or ambiguities that are hard to resolve programmatically. In these situations, you may need to prompt the user for clarification or make a best guess based on the available information. The key is to strive for a balance of flexibility and consistency in how you handle time-based queries, factoring in publication dates when relevant.
 
-By modeling time filters with Pydantic and leveraging Instructor, RAG systems can effectively handle time-based queries. Clear prompts, careful model design, and appropriate parsing strategies enable accurate retrieval of information within specific time frames, enhancing the system's overall relevance and accuracy.
\ No newline at end of file
+By modeling time filters with Pydantic and leveraging Instructor, RAG systems can effectively handle time-based queries. Clear prompts, careful model design, and appropriate parsing strategies enable accurate retrieval of information within specific time frames, enhancing the system's overall relevance and accuracy.
diff --git a/docs/blog/posts/using_json.md b/docs/blog/posts/using_json.md
index 9d9743a3e..2e974ac84 100644
--- a/docs/blog/posts/using_json.md
+++ b/docs/blog/posts/using_json.md
@@ -21,7 +21,7 @@ tags:
 
 Large Language Models (LLMs) like GPT are incredibly powerful, but getting them to return well-formatted JSON can be challenging. This is where the Instructor library shines. Instructor allows you to easily map LLM outputs to JSON data using Python type annotations and Pydantic models.
 
-Instructor makes it easy to get structured data like JSON from LLMs like GPT-3.5, GPT-4, GPT-4-Vision, and open-source models including [Mistral/Mixtral](../../hub/together.md), [Anyscale](../../hub/anyscale.md), [Ollama](../../hub/ollama.md), and [llama-cpp-python](../../hub/llama-cpp-python.md).
+Instructor makes it easy to get structured data like JSON from LLMs like GPT-3.5, GPT-4, GPT-4-Vision, and open-source models including [Mistral/Mixtral](../../integrations/together.md), [Ollama](../../integrations/ollama.md), and [llama-cpp-python](../../integrations/llama-cpp-python.md).
 
 It stands out for its simplicity, transparency, and user-centric design, built on top of Pydantic. Instructor helps you manage [validation context](../../concepts/reask_validation.md), retries with [Tenacity](../../concepts/retrying.md), and streaming [Lists](../../concepts/lists.md) and [Partial](../../concepts/partial.md) responses.
 
@@ -70,7 +70,7 @@ user = client.chat.completions.create(
 )
 
 print(user.model_dump())
-# > { 
+# > {
 #     "name": "John Doe",
 #     "age": 25,
 #     "email": "john@example.com"
@@ -107,4 +107,4 @@ So while dictionaries can work for very simple JSON structures, Pydantic models
 
 ## JSON from LLMs Made Easy
 
-Instructor and Pydantic together provide a fantastic way to extract and work with JSON data from LLMs. The lightweight patching of Instructor combined with the powerful validation and typing of Pydantic models makes it easy to integrate JSON outputs into your LLM-powered applications. Give Instructor a try and see how much easier it makes getting JSON from LLMs!
\ No newline at end of file
+Instructor and Pydantic together provide a fantastic way to extract and work with JSON data from LLMs. The lightweight patching of Instructor combined with the powerful validation and typing of Pydantic models makes it easy to integrate JSON outputs into your LLM-powered applications. Give Instructor a try and see how much easier it makes getting JSON from LLMs!
diff --git a/docs/blog/posts/version-1.md b/docs/blog/posts/version-1.md
index 13f9a3487..064203eaf 100644
--- a/docs/blog/posts/version-1.md
+++ b/docs/blog/posts/version-1.md
@@ -42,7 +42,7 @@ import instructor
 client = instructor.from_openai(openai.OpenAI())
 ```
 
-Except now, any default arguments you want to place into the `create` call will be passed to the client. via kwargs. 
+Except now, any default arguments you want to place into the `create` call will be passed to the client. via kwargs.
 
 IF you know you want to pass in tempurature, seed, or model, you can do so.
 
@@ -52,15 +52,15 @@ import openai
 import instructor
 
 client = instructor.from_openai(
-    openai.OpenAI(), 
-    model="gpt-4-turbo-preview", 
+    openai.OpenAI(),
+    model="gpt-4-turbo-preview",
     temperature=0.2
 )
 ```
 
-Now, whenever you call `client.chat.completions.create` the `model` and `temperature` will be passed to the openai client! 
+Now, whenever you call `client.chat.completions.create` the `model` and `temperature` will be passed to the openai client!
 
-## No new Standards 
+## No new Standards
 
 When I first started working on this project, my goal was to ensure that we weren't introducing any new standards. Instead, our focus was on maintaining compatibility with existing ones. By creating our own client, we can seamlessly proxy OpenAI's `chat.completions.create` and Anthropic's `messages.create` methods. This approach allows us to provide a smooth upgrade path for your client, enabling support for all the latest models and features as they become available. Additionally, this strategy safeguards us against potential downstream changes.
 
@@ -69,6 +69,9 @@ import openai
 import anthropic
 import litellm
 import instructor
+from typing import TypeVar
+
+T = TypeVar("T")
 
 # These are all ways to create a client
 client = instructor.from_openai(openai.OpenAI())
@@ -76,10 +79,10 @@ client = instructor.from_anthropic(anthropic.Anthropic())
 client = instructor.from_litellm(litellm.completion)
 
 # all of these will route to the same underlying create function
-# allow you to add instructor to try it out, while easily removing it 
-client.create(..., response_model=Type[T]) -> T
-client.chat.completions.create(..., response_model=Type[T]) -> T
-client.messages.create(..., response_model=Type[T]) -> T
+# allow you to add instructor to try it out, while easily removing it
+client.create(model="gpt-4", response_model=type[T]) -> T
+client.chat.completions.create(model="gpt-4", response_model=type[T]) -> T
+client.messages.create(model="gpt-4", response_model=type[T]) -> T
 ```
 
 ## Type are infered correctly
@@ -114,7 +117,7 @@ Now if you use a ID, you can see the type is correctly infered.
 
 ### Handling async: `await create`
 
-This will also work correctly with asynchronous clients. 
+This will also work correctly with asynchronous clients.
 
 ```python
 import openai
@@ -253,7 +256,7 @@ Instructor has always supported validation and error handling. But now, we've ad
 
 If you want to learn more check out the docs on [retrying](../../concepts/retrying.md) and [reasking](../../concepts/reask_validation.md)
 
-## Support in multiple languages 
+## Support in multiple languages
 
 While each flavor is different the core philosophy is the same. Keeping it as close as possible to the common api allows us to support all the same features in all the same languages by hooking into each libraries's popular validation libraries.
 
@@ -263,4 +266,4 @@ Check out:
 - [Elixir](https://github.com/instructor-ai/instructor-elixir)
 - [PHP](https://github.com/cognesy/instructor-php)
 
-If you're interested in contributing, check out the [contributing guide](../../contributing.md), and you want to create instructor in your language, let [me](https://twitter.com/jxnlco) know and I can help with promotion and connecting all the docs!
\ No newline at end of file
+If you're interested in contributing, check out the [contributing guide](../../contributing.md), and you want to create instructor in your language, let [me](https://twitter.com/jxnlco) know and I can help with promotion and connecting all the docs!
diff --git a/docs/blog/posts/youtube-flashcards.md b/docs/blog/posts/youtube-flashcards.md
index b12f3d5f3..3074ef502 100644
--- a/docs/blog/posts/youtube-flashcards.md
+++ b/docs/blog/posts/youtube-flashcards.md
@@ -54,28 +54,22 @@ import uuid
 from pydantic import BaseModel, Field
 from pydantic.json_schema import SkipJsonSchema
 
+
 class QuestionAnswer(BaseModel):
     question: str = Field(description="Question about the topic")
     options: list[str] = Field(
-        description="Potential answers to the question.",
-        min_items=3,
-        max_items=5
+        description="Potential answers to the question.", min_items=3, max_items=5
     )
     answer_index: int = Field(
-        description="Index of the correct answer options (starting from 0).",
-        ge=0,
-        lt=5
+        description="Index of the correct answer options (starting from 0).", ge=0, lt=5
     )
     difficulty: int = Field(
         description="Difficulty of this question from 1 to 5, 5 being the most difficult.",
         gt=0,
-        le=5, 
+        le=5,
     )
     youtube_url: SkipJsonSchema[str | None] = None
-    id: uuid.UUID = Field(
-        description="Unique identifier",
-        default_factory=uuid.uuid4
-    )
+    id: uuid.UUID = Field(description="Unique identifier", default_factory=uuid.uuid4)
 ```
 
 This examples shows several `instructor` features:
@@ -98,10 +92,10 @@ We use `youtube-transcript-api` to get the full transcript of a video.
 ```python
 from youtube_transcript_api import YouTubeTranscriptApi
 
-youtube_url = "https://www.youtube.com/watch?v=hqutVJyd3TI" 
+youtube_url = "https://www.youtube.com/watch?v=hqutVJyd3TI"
 _, _, video_id = youtube_url.partition("?v=")
 segments = YouTubeTranscriptApi.get_transcript(video_id)
-transcript = " ".join([s['text'] for s in segments])
+transcript = " ".join([s["text"] for s in segments])
 ```
 
 ### 3. Generate question-answer pairs
@@ -191,7 +185,9 @@ from burr.core import action, State
 @action(reads=[], writes=["youtube_url"])
 def process_user_input(state: State, user_input: str) -> State:
     """Process user input and update the YouTube URL."""
-    youtube_url = user_input  # In practice, we would have more complex validation logic.
+    youtube_url = (
+        user_input  # In practice, we would have more complex validation logic.
+    )
     return state.update(youtube_url=youtube_url)
 
 
@@ -199,10 +195,10 @@ def process_user_input(state: State, user_input: str) -> State:
 def get_youtube_transcript(state: State) -> State:
     """Get the official YouTube transcript for a video given it's URL"""
     youtube_url = state["youtube_url"]
-    
+
     _, _, video_id = youtube_url.partition("?v=")
     transcript = YouTubeTranscriptApi.get_transcript(video_id)
-    full_transcript = " ".join([entry['text'] for entry in transcript])
+    full_transcript = " ".join([entry["text"] for entry in transcript])
 
     # store the transcript in state
     return state.update(transcript=full_transcript, youtube_url=youtube_url)
diff --git a/docs/blog/posts/youtube-transcripts.md b/docs/blog/posts/youtube-transcripts.md
index 8f8e84788..83ee75831 100644
--- a/docs/blog/posts/youtube-transcripts.md
+++ b/docs/blog/posts/youtube-transcripts.md
@@ -29,7 +29,7 @@ In this post, we'll show you how to summarise Youtube video transcripts into dis
 
 By the end of this article, you'll be able to build an application as per the video below.
 
-![](../../hub/img/youtube.gif)
+![](../../img/youtube.gif)
 
 <!-- more -->
 
diff --git a/docs/concepts/index.md b/docs/concepts/index.md
index 37d0e5c89..d8cbafe39 100644
--- a/docs/concepts/index.md
+++ b/docs/concepts/index.md
@@ -1,5 +1,5 @@
 ---
-title: Understanding Instructor: Key Concepts for Structured Outputs in AI
+title: Key Concepts for Structured Outputs in AI
 description: Explore essential concepts in Instructor for efficient extraction and validation of structured data from AI models.
 ---
 
diff --git a/docs/concepts/iterable.md b/docs/concepts/iterable.md
new file mode 100644
index 000000000..c55e8c4ca
--- /dev/null
+++ b/docs/concepts/iterable.md
@@ -0,0 +1,172 @@
+---
+title: Extracting Structured Data with Iterable and Streaming in Python
+description: Learn to use Iterable and streaming for structured data extraction with Pydantic and OpenAI in Python.
+---
+
+# Multi-task and Streaming
+
+A common use case of structured extraction is defining a single schema class and then making another schema to create a list to do multiple extraction
+
+```python
+from typing import List
+from pydantic import BaseModel
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+class Users(BaseModel):
+    users: List[User]
+
+
+print(Users.model_json_schema())
+"""
+{
+    '$defs': {
+        'User': {
+            'properties': {
+                'name': {'title': 'Name', 'type': 'string'},
+                'age': {'title': 'Age', 'type': 'integer'},
+            },
+            'required': ['name', 'age'],
+            'title': 'User',
+            'type': 'object',
+        }
+    },
+    'properties': {
+        'users': {'items': {'$ref': '#/$defs/User'}, 'title': 'Users', 'type': 'array'}
+    },
+    'required': ['users'],
+    'title': 'Users',
+    'type': 'object',
+}
+"""
+```
+
+Defining a task and creating a list of classes is a common enough pattern that we make this convenient by making use of `Iterable[T]`. This lets us dynamically create a new class that:
+
+1. Has dynamic docstrings and class name based on the task
+2. Support streaming by collecting tokens until a task is received back out.
+
+## Extracting Tasks using Iterable
+
+By using `Iterable` you get a very convenient class with prompts and names automatically defined:
+
+```python
+import instructor
+from openai import OpenAI
+from typing import Iterable
+from pydantic import BaseModel
+
+client = instructor.from_openai(OpenAI(), mode=instructor.function_calls.Mode.JSON)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+users = client.chat.completions.create(
+    model="gpt-3.5-turbo-1106",
+    temperature=0.1,
+    response_model=Iterable[User],
+    stream=False,
+    messages=[
+        {
+            "role": "user",
+            "content": "Consider this data: Jason is 10 and John is 30.\
+                         Correctly segment it into entitites\
+                        Make sure the JSON is correct",
+        },
+    ],
+)
+for user in users:
+    print(user)
+    #> name='Jason' age=10
+    #> name='John' age=30
+```
+
+## Streaming Tasks
+
+We can also generate tasks as the tokens are streamed in by defining an `Iterable[T]` type.
+
+Lets look at an example in action with the same class
+
+```python hl_lines="6 26"
+import instructor
+import openai
+from typing import Iterable
+from pydantic import BaseModel
+
+client = instructor.from_openai(openai.OpenAI(), mode=instructor.Mode.TOOLS)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+users = client.chat.completions.create(
+    model="gpt-4",
+    temperature=0.1,
+    stream=True,
+    response_model=Iterable[User],
+    messages=[
+        {
+            "role": "system",
+            "content": "You are a perfect entity extraction system",
+        },
+        {
+            "role": "user",
+            "content": (f"Extract `Jason is 10 and John is 10`"),
+        },
+    ],
+    max_tokens=1000,
+)
+
+for user in users:
+    print(user)
+    #> name='Jason' age=10
+    #> name='John' age=10
+```
+
+## Asynchronous Streaming
+
+I also just want to call out in this example that `instructor` also supports asynchronous streaming. This is useful when you want to stream a response model and process the results as they come in, but you'll need to use the `async for` syntax to iterate over the results.
+
+```python
+import instructor
+import openai
+from typing import Iterable
+from pydantic import BaseModel
+
+client = instructor.from_openai(openai.AsyncOpenAI(), mode=instructor.Mode.TOOLS)
+
+
+class UserExtract(BaseModel):
+    name: str
+    age: int
+
+
+async def print_iterable_results():
+    model = await client.chat.completions.create(
+        model="gpt-4",
+        response_model=Iterable[UserExtract],
+        max_retries=2,
+        stream=True,
+        messages=[
+            {"role": "user", "content": "Make two up people"},
+        ],
+    )
+    async for m in model:
+        print(m)
+        #> name='John Doe' age=25
+        #> name='Jane Doe' age=28
+
+
+import asyncio
+
+asyncio.run(print_iterable_results())
+```
diff --git a/docs/concepts/patching.md b/docs/concepts/patching.md
index a6f5b3bf8..25edb4f59 100644
--- a/docs/concepts/patching.md
+++ b/docs/concepts/patching.md
@@ -49,7 +49,7 @@ client = instructor.from_gemini(
 
 This method allows us to get structured output from Gemini via tool calling with the Vertex AI SDK.
 
-**Note:** Gemini Tool Calling is in preview and there are some limitations, you can learn more in the [Vertex AI examples notebook](../hub/vertexai.md).
+**Note:** Gemini Tool Calling is in preview and there are some limitations, you can learn more in the [Vertex AI examples notebook](../integrations/vertex.md).
 
 ```python
 import instructor
diff --git a/docs/concepts/prompting.md b/docs/concepts/prompting.md
index 75e3e7bc2..79c902c8f 100644
--- a/docs/concepts/prompting.md
+++ b/docs/concepts/prompting.md
@@ -15,7 +15,7 @@ The overarching theme of using Instructor and Pydantic for function calling is t
 - **Entity Relationships**: Define explicit identifiers and relationship fields.
 - **Contextual Logic**: Optionally add a "chain of thought" field in reusable components for extra context.
 
-## Modular Chain of Thought
+## Modular Chain of Thought {#chain-of-thought}
 
 This approach to "chain of thought" improves data quality but can have modular components rather than global CoT.
 
@@ -120,6 +120,8 @@ class UserDetail(BaseModel):
     )
 ```
 
+## Literals {#literals}
+
 If you're having a hard time with `Enum` an alternative is to use `Literal`
 
 ```python hl_lines="4"
diff --git a/docs/concepts/unions.md b/docs/concepts/unions.md
new file mode 100644
index 000000000..2db0650d6
--- /dev/null
+++ b/docs/concepts/unions.md
@@ -0,0 +1,168 @@
+# Working with Union Types in Instructor
+
+This guide explains how to work with union types in Instructor, allowing you to handle multiple possible response types from language models.
+
+## Basic Union Types
+
+Union types let you specify that a field can be one of several types:
+
+```python
+from typing import Union
+from pydantic import BaseModel
+
+class Response(BaseModel):
+    value: Union[str, int]  # Can be either string or integer
+```
+
+## Discriminated Unions
+
+Use discriminated unions to handle different response types:
+
+```python
+from typing import Literal, Union
+from pydantic import BaseModel
+
+class UserQuery(BaseModel):
+    type: Literal["user"]
+    username: str
+
+class SystemQuery(BaseModel):
+    type: Literal["system"]
+    command: str
+
+Query = Union[UserQuery, SystemQuery]
+
+# Usage with Instructor
+response = client.chat.completions.create(
+    model="gpt-3.5-turbo",
+    response_model=Query,
+    messages=[{"role": "user", "content": "Parse: user lookup jsmith"}]
+)
+```
+
+## Optional Fields
+
+Combine Union with Optional for nullable fields:
+
+```python
+from typing import Optional
+from pydantic import BaseModel
+
+class User(BaseModel):
+    name: str
+    email: Optional[str] = None  # Same as Union[str, None]
+```
+
+## Best Practices
+
+1. **Type Hints**: Use proper type hints for clarity
+2. **Discriminators**: Add discriminator fields for complex unions
+3. **Validation**: Add validators for union fields
+4. **Documentation**: Document expected types clearly
+
+## Common Patterns
+
+### Multiple Response Types
+```python
+from typing import Union, Literal
+from pydantic import BaseModel
+
+class SuccessResponse(BaseModel):
+    status: Literal["success"]
+    data: dict
+
+class ErrorResponse(BaseModel):
+    status: Literal["error"]
+    message: str
+
+Response = Union[SuccessResponse, ErrorResponse]
+```
+
+### Nested Unions
+```python
+from typing import Union, List
+from pydantic import BaseModel
+
+class TextContent(BaseModel):
+    type: Literal["text"]
+    text: str
+
+class ImageContent(BaseModel):
+    type: Literal["image"]
+    url: str
+
+class Message(BaseModel):
+    content: List[Union[TextContent, ImageContent]]
+```
+
+## Integration with Instructor
+
+### Validation with Unions
+```python
+from instructor import patch
+from openai import OpenAI
+
+client = patch(OpenAI())
+
+def validate_response(response: Response) -> bool:
+    if isinstance(response, ErrorResponse):
+        return len(response.message) > 0
+    return True
+
+result = client.chat.completions.create(
+    model="gpt-3.5-turbo",
+    response_model=Response,
+    validation_hook=validate_response,
+    messages=[{"role": "user", "content": "Process this request"}]
+)
+```
+
+### Streaming with Unions
+```python
+def stream_content():
+    response = client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=Message,
+        stream=True,
+        messages=[{"role": "user", "content": "Generate mixed content"}]
+    )
+    for partial in response:
+        if partial.content:
+            for item in partial.content:
+                if isinstance(item, TextContent):
+                    print(f"Text: {item.text}")
+                elif isinstance(item, ImageContent):
+                    print(f"Image: {item.url}")
+```
+
+## Error Handling
+
+Handle union type validation errors:
+
+```python
+from pydantic import ValidationError
+
+try:
+    response = Response(
+        status="invalid",  # Invalid status
+        data={"key": "value"}
+    )
+except ValidationError as e:
+    print(f"Validation error: {e}")
+```
+
+## Type Checking
+
+Use isinstance() for runtime type checking:
+
+```python
+def process_response(response: Response):
+    if isinstance(response, SuccessResponse):
+        # Handle success case
+        process_data(response.data)
+    elif isinstance(response, ErrorResponse):
+        # Handle error case
+        log_error(response.message)
+```
+
+For more information about union types, check out the [Pydantic documentation on unions](https://docs.pydantic.dev/latest/concepts/types/#unions).
diff --git a/docs/concepts/validation.md b/docs/concepts/validation.md
new file mode 100644
index 000000000..010039a8a
--- /dev/null
+++ b/docs/concepts/validation.md
@@ -0,0 +1,150 @@
+# Validation in Instructor
+
+This guide covers validation concepts and best practices when using Instructor for structured outputs.
+
+## Overview
+
+Validation in Instructor ensures that the output from language models matches your expected schema. This is crucial for:
+- Data consistency
+- Error handling
+- Type safety
+- Business logic enforcement
+
+## Basic Validation
+
+Instructor uses Pydantic for validation, which provides:
+1. Type checking
+2. Data coercion
+3. Custom validators
+4. Field constraints
+
+```python
+from pydantic import BaseModel, Field, validator
+from typing import List
+
+class User(BaseModel):
+    name: str = Field(..., min_length=2)
+    age: int = Field(..., ge=0, le=150)
+    emails: List[str]
+
+    @validator('emails')
+    def validate_emails(cls, v):
+        if not all('@' in email for email in v):
+            raise ValueError('Invalid email format')
+        return v
+```
+
+## Validation Strategies
+
+### 1. Field Validation
+
+Use Field() for basic constraints:
+```python
+class Product(BaseModel):
+    name: str = Field(..., min_length=1, max_length=100)
+    price: float = Field(..., gt=0)
+    quantity: int = Field(..., ge=0)
+```
+
+### 2. Custom Validators
+
+Use @validator for complex validation:
+```python
+class Order(BaseModel):
+    items: List[str]
+    total: float
+
+    @validator('total')
+    def validate_total(cls, v, values):
+        if v < 0:
+            raise ValueError('Total cannot be negative')
+        return v
+```
+
+### 3. Pre-validation Hooks
+
+Use pre-validation hooks for data transformation:
+```python
+class UserProfile(BaseModel):
+    username: str
+
+    @validator('username', pre=True)
+    def lowercase_username(cls, v):
+        return v.lower()
+```
+
+## Error Handling
+
+Instructor provides robust error handling for validation failures:
+
+```python
+from instructor import patch
+import openai
+
+client = patch(openai.OpenAI())
+
+try:
+    user = client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=User,
+        messages=[{"role": "user", "content": "Extract: John Doe, age: -5"}]
+    )
+except ValueError as e:
+    print(f"Validation error: {e}")
+```
+
+## Best Practices
+
+1. **Start Simple**: Begin with basic type validation before adding complex rules
+2. **Use Type Hints**: Always specify types for better code clarity
+3. **Document Constraints**: Add clear descriptions to Field() definitions
+4. **Handle Errors**: Implement proper error handling for validation failures
+5. **Test Edge Cases**: Verify validation works with unexpected inputs
+
+## Common Patterns
+
+### Optional Fields
+```python
+class Profile(BaseModel):
+    name: str
+    bio: Optional[str] = None
+```
+
+### Nested Validation
+```python
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    addresses: List[Address]
+```
+
+### Complex Validation
+```python
+class Transaction(BaseModel):
+    amount: float
+    currency: str
+    timestamp: datetime
+
+    @validator('currency')
+    def validate_currency(cls, v):
+        valid_currencies = ['USD', 'EUR', 'GBP']
+        if v not in valid_currencies:
+            raise ValueError(f'Currency must be one of {valid_currencies}')
+        return v
+```
+
+## Related Resources
+
+- [Pydantic Documentation](https://docs.pydantic.dev/)
+- [OpenAI Function Calling](https://platform.openai.com/docs/guides/gpt/function-calling)
+- [Instructor Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+- Works with all supported LLM providers
+- Compatible with latest Pydantic versions
+- Regular updates for new validation features
diff --git a/docs/examples/classification.md b/docs/examples/classification.md
index 87fc05608..442cc1029 100644
--- a/docs/examples/classification.md
+++ b/docs/examples/classification.md
@@ -333,4 +333,4 @@ print(f"Predicted Labels: {prediction.class_labels}")
 #> Predicted Labels: ['TECH_ISSUE', 'BILLING']
 ```
 
-By using Literals and including few-shot examples, we've improved both the single-label and multi-label classification implementations. These changes enhance type safety and provide better guidance for the AI model, potentially leading to more accurate classifications.
\ No newline at end of file
+By using Literals and including few-shot examples, we've improved both the single-label and multi-label classification implementations. These changes enhance type safety and provide better guidance for the AI model, potentially leading to more accurate classifications.
diff --git a/docs/examples/index.md b/docs/examples/index.md
index 6058577c8..238f8946e 100644
--- a/docs/examples/index.md
+++ b/docs/examples/index.md
@@ -11,7 +11,7 @@ Welcome to our collection of cookbooks showcasing the power of structured output
 
 1. [Enum-Based Classification](classification.md): Implement structured classification using Python enums with AI models.
 2. [AI Self-Assessment and Correction](self_critique.md): Explore techniques for AI models to evaluate and improve their own outputs.
-3. [Efficient Batch Classification](batch_classification.md): Process multiple items simultaneously for improved performance.
+3. [Efficient Batch Classification](bulk_classification.md): Process multiple items simultaneously for improved performance.
 4. [Precise Citation Extraction](exact_citations.md): Accurately retrieve and format citations from text using AI.
 5. [Search Query Segmentation](search.md): Break down complex search queries into structured components for better understanding.
 6. [Dynamic Knowledge Graph Generation](knowledge_graph.md): Create visual representations of information relationships using AI.
diff --git a/docs/examples/recursive.md b/docs/examples/recursive.md
new file mode 100644
index 000000000..3d973c64f
--- /dev/null
+++ b/docs/examples/recursive.md
@@ -0,0 +1,127 @@
+---
+title: Working with Recursive Schemas in Instructor
+description: Learn how to effectively implement and use recursive Pydantic models for handling nested and hierarchical data structures.
+---
+
+# Recursive Schema Implementation Guide
+
+This guide demonstrates how to work with recursive schemas in Instructor using Pydantic models. While flat schemas are often simpler to work with, some use cases require recursive structures to represent hierarchical data effectively.
+
+!!! tips "Motivation"
+    Recursive schemas are particularly useful when dealing with:
+    * Nested organizational structures
+    * File system hierarchies
+    * Comment threads with replies
+    * Task dependencies with subtasks
+    * Abstract syntax trees
+
+## Defining a Recursive Schema
+
+Here's an example of how to define a recursive Pydantic model:
+
+```python
+from typing import List, Optional
+from pydantic import BaseModel, Field
+
+class RecursiveNode(BaseModel):
+    """A node that can contain child nodes of the same type."""
+
+    name: str = Field(..., description="Name of the node")
+    value: Optional[str] = Field(None, description="Optional value associated with the node")
+    children: List["RecursiveNode"] = Field(
+        default_factory=list,
+        description="List of child nodes"
+    )
+
+# Required for recursive Pydantic models
+RecursiveNode.model_rebuild()
+```
+
+## Example Usage
+
+Let's see how to use this recursive schema with Instructor:
+
+```python
+import instructor
+from openai import OpenAI
+
+client = instructor.from_openai(OpenAI())
+
+def parse_hierarchy(text: str) -> RecursiveNode:
+    """Parse text into a hierarchical structure."""
+    return client.chat.completions.create(
+        model="gpt-4",
+        messages=[
+            {
+                "role": "system",
+                "content": "You are an expert at parsing text into hierarchical structures."
+            },
+            {
+                "role": "user",
+                "content": f"Parse this text into a hierarchical structure: {text}"
+            }
+        ],
+        response_model=RecursiveNode
+    )
+
+# Example usage
+hierarchy = parse_hierarchy("""
+Company: Acme Corp
+- Department: Engineering
+  - Team: Frontend
+    - Project: Website Redesign
+    - Project: Mobile App
+  - Team: Backend
+    - Project: API v2
+    - Project: Database Migration
+- Department: Marketing
+  - Team: Digital
+    - Project: Social Media Campaign
+  - Team: Brand
+    - Project: Logo Refresh
+""")
+```
+
+## Validation and Best Practices
+
+When working with recursive schemas:
+
+1. Always call `model_rebuild()` after defining the model
+2. Consider adding validation for maximum depth to prevent infinite recursion
+3. Use type hints properly to maintain code clarity
+4. Consider implementing custom validators for specific business rules
+
+```python
+from pydantic import model_validator
+
+class RecursiveNodeWithDepth(RecursiveNode):
+    @model_validator(mode='after')
+    def validate_depth(self) -> "RecursiveNodeWithDepth":
+        def check_depth(node: "RecursiveNodeWithDepth", current_depth: int = 0) -> int:
+            if current_depth > 10:  # Maximum allowed depth
+                raise ValueError("Maximum depth exceeded")
+            return max(
+                [check_depth(child, current_depth + 1) for child in node.children],
+                default=current_depth
+            )
+
+        check_depth(self)
+        return self
+```
+
+## Performance Considerations
+
+While recursive schemas are powerful, they can be more challenging for language models to handle correctly. Consider these tips:
+
+1. Keep structures as shallow as possible
+2. Use clear naming conventions
+3. Provide good examples in your prompts
+4. Consider breaking very large structures into smaller chunks
+
+## Conclusion
+
+Recursive schemas provide a powerful way to handle hierarchical data structures in your applications. While they require more careful handling than flat schemas, they can be invaluable for certain use cases.
+
+For more examples of working with complex data structures, check out:
+1. [Query Planning with Dependencies](planning-tasks.md)
+2. [Knowledge Graph Generation](knowledge_graph.md)
diff --git a/docs/hub/anthropic.md b/docs/hub/anthropic.md
deleted file mode 100644
index 7ba83f59b..000000000
--- a/docs/hub/anthropic.md
+++ /dev/null
@@ -1,70 +0,0 @@
----
-title: Integrating Anthropic with Instructor Client for Enhanced User Modeling
-description: Learn how to combine Anthropic and Instructor clients to create user models with complex properties in Python.
----
-
-# Anthropic 
-
-Now that we have a [Anthropic](https://www.anthropic.com/) client, we can use it with the `instructor` client to make requests.
-
-```
-pip install anthropic
-```
-
-```python
-from pydantic import BaseModel
-from typing import List
-import anthropic
-import instructor
-
-# Patching the Anthropics client with the instructor for enhanced capabilities
-client = instructor.from_anthropic(
-    anthropic.Anthropic(),
-)
-
-
-class Properties(BaseModel):
-    name: str
-    value: str
-
-
-class User(BaseModel):
-    name: str
-    age: int
-    properties: List[Properties]
-
-
-# client.messages.create will also work due to the instructor client
-user_response = client.chat.completions.create(
-    model="claude-3-haiku-20240307",
-    max_tokens=1024,
-    max_retries=0,
-    messages=[
-        {
-            "role": "user",
-            "content": "Create a user for a model with a name, age, and properties.",
-        }
-    ],
-    response_model=User,
-)  # type: ignore
-
-print(user_response.model_dump_json(indent=2))
-"""
-{
-  "name": "John Doe",
-  "age": 35,
-  "properties": [
-    {
-      "name": "City",
-      "value": "New York"
-    },
-    {
-      "name": "Occupation",
-      "value": "Software Engineer"
-    }
-  ]
-}
-"""
-```
-
-We're encountering challenges with deeply nested types and eagerly invite the community to test, provide feedback, and suggest necessary improvements as we enhance the anthropic client's support.
\ No newline at end of file
diff --git a/docs/hub/anyscale.md b/docs/hub/anyscale.md
deleted file mode 100644
index d61abdff6..000000000
--- a/docs/hub/anyscale.md
+++ /dev/null
@@ -1,83 +0,0 @@
----
-draft: False
-date: 2023-12-15
-slug: anyscale
-tags:
-  - patching
-  - open source
-authors:
-  - anmol
-  - jxnl
----
-
-# Structured Outputs with Anyscale
-
-If you want to try this example using `instructor hub`, you can pull it by running
-
-```bash
-instructor hub pull --slug anyscale --py > anyscale_example.py
-```
-
-Open-source LLMS are gaining popularity, and the release of Anyscale's Mistral model has made it possible to obtain structured outputs using JSON schema at any scale. Instead of relying on a model's default output mode, you can utilize JSON schema to obtain structured outputs. This approach is a time-saving alternative to extensive prompt engineering.
-
-By the end of this blog post, you will learn how to effectively utilize the instructor at any scale. But before we proceed, let's first explore the concept of patching.
-
-<!-- more -->
-
-## Patching
-
-Instructor's patch enhances a openai api it with the following features:
-
-- `response_model` in `create` calls that returns a pydantic model
-- `max_retries` in `create` calls that retries the call if it fails by using a backoff strategy
-
-!!! note "Learn More"
-
-    To learn more, please refer to the [docs](../index.md). To understand the benefits of using Pydantic with Instructor, visit the tips and tricks section of the [why use Pydantic](../why.md) page.
-
-## Anyscale
-
-The good news is that Anyscale employs the same OpenAI client, and its models support some of these output modes too!
-
-!!! note "Getting access"
-
-    If you want to try this out for yourself check out the [Anyscale](https://anyscale.com/) website. You can get started [here](https://docs.anyscale.com/get-started).
-
-Let's explore one of the models available in Anyscale's extensive collection!
-
-```python
-from openai import OpenAI
-from pydantic import BaseModel
-import os
-import instructor
-
-
-class UserDetails(BaseModel):
-    name: str
-    age: int
-
-
-# enables `response_model` in create call
-client = instructor.from_openai(
-    OpenAI(
-        base_url="https://api.endpoints.anyscale.com/v1",
-        api_key=os.environ["ANYSCALE_API_KEY"],
-    ),
-    # This uses Anyscale's json schema output mode
-    mode=instructor.Mode.JSON_SCHEMA,
-)
-
-resp = client.chat.completions.create(
-    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
-    messages=[
-        {"role": "system", "content": "You are a world class extractor"},
-        {"role": "user", "content": 'Extract the following entities: "Jason is 20"'},
-    ],
-    response_model=UserDetails,
-)
-print(resp)
-#> name='Jason' age=20
-# # > name='Jason' age=20
-```
-
-You can find more information about Anyscale's output mode support [here](https://docs.endpoints.anyscale.com/).
diff --git a/docs/hub/groq.md b/docs/hub/groq.md
deleted file mode 100644
index 7d112daa4..000000000
--- a/docs/hub/groq.md
+++ /dev/null
@@ -1,88 +0,0 @@
----
-title: Structured Outputs with Groq AI and Pydantic
-description: Learn how to use Groq AI for structured outputs with Pydantic in Python and enhance API interactions.
----
-
-# Structured Outputs with Groq AI
-
-If you want to try this example using `instructor hub`, you can pull it by running
-
-```bash
-instructor hub pull --slug groq --py > groq_example.py
-```
-
-you'll need to sign up for an account and get an API key. You can do that [here](https://console.groq.com/docs/quickstart).
-
-```bash
-export GROQ_API_KEY=<your-api-key-here>
-pip install groq
-```
-
-!!! note "Other Languages"
-
-    This blog post is written in Python, but the concepts are applicable to other languages as well, as we currently have support for [Javascript](https://instructor-ai.github.io/instructor-js), [Elixir](https://hexdocs.pm/instructor/Instructor.html) and [PHP](https://github.com/cognesy/instructor-php/).
-
-<!-- more -->
-
-## Patching
-
-Instructor's patch enhances the openai api it with the following features:
-
-- `response_model` in `create` calls that returns a pydantic model
-- `max_retries` in `create` calls that retries the call if it fails by using a backoff strategy
-
-!!! note "Learn More"
-
-    To learn more, please refer to the [docs](../index.md). To understand the benefits of using Pydantic with Instructor, visit the tips and tricks section of the [why use Pydantic](../why.md) page.
-
-## Groq AI
-
-While Groq AI does not support function calling directly, you can still leverage the TOOLS mode for structured outputs.
-
-!!! note "Getting access"
-
-    If you want to try this out for yourself check out the [docs](https://console.groq.com/docs/quickstart)
-
-
-```python
-import os
-import instructor
-
-from groq import Groq
-from pydantic import BaseModel
-
-client = Groq(
-    api_key=os.environ.get("GROQ_API_KEY"),
-)
-
-# By default, the patch function will patch the ChatCompletion.create and ChatCompletion.create methods to support the response_model parameter
-client = instructor.from_groq(client, mode=instructor.Mode.TOOLS)
-
-
-# Now, we can use the response_model parameter using only a base model
-# rather than having to use the OpenAISchema class
-class UserExtract(BaseModel):
-    name: str
-    age: int
-
-
-user: UserExtract = client.chat.completions.create(
-    model="mixtral-8x7b-32768",
-    response_model=UserExtract,
-    messages=[
-        {"role": "user", "content": "Extract jason is 25 years old"},
-    ],
-)
-
-assert isinstance(user, UserExtract), "Should be instance of UserExtract"
-assert user.name.lower() == "jason"
-assert user.age == 25
-
-print(user.model_dump_json(indent=2))
-"""
-{
-  "name": "jason",
-  "age": 25
-}
-"""
-```
diff --git a/docs/hub/index.md b/docs/hub/index.md
index c5b49e4b1..9afb6196c 100644
--- a/docs/hub/index.md
+++ b/docs/hub/index.md
@@ -1,6 +1,6 @@
 ---
-title: Instructor Hub: Tutorials and Examples for Getting Started with Instructor
-description: Explore the Instructor Hub for tutorials, CLI usage, and examples to enhance your coding experience with the instructor API.
+title: Instructor Hub
+description: Tutorials and Examples for using Structured Outputs with Instructor
 ---
 
 # Instructor Hub
diff --git a/docs/hub/llama-cpp-python.md b/docs/hub/llama-cpp-python.md
deleted file mode 100644
index 3f3b9874d..000000000
--- a/docs/hub/llama-cpp-python.md
+++ /dev/null
@@ -1,123 +0,0 @@
----
-draft: False
-date: 2024-02-12
-slug: llama-cpp-python
-tags:
-  - patching
-authors:
-  - jxnl
----
-
-# Structured Outputs with llama-cpp-python
-
-If you want to try this example using `instructor hub`, you can pull it by running
-
-```bash
-instructor hub pull --slug llama-cpp-python --py > llama_cpp_python_example.py
-```
-
-Open-source LLMS are gaining popularity, and llama-cpp-python has made the `llama-cpp` model available to obtain structured outputs using JSON schema via a mixture of [constrained sampling](https://llama-cpp-python.readthedocs.io/en/latest/#json-schema-mode) and [speculative decoding](https://llama-cpp-python.readthedocs.io/en/latest/#speculative-decoding). They also support a [OpenAI compatible client](https://llama-cpp-python.readthedocs.io/en/latest/#openai-compatible-web-server), which can be used to obtain structured output as a in process mechanism to avoid any network dependency.
-
-<!-- more -->
-
-## Patching
-
-Instructor's patch enhances an create call it with the following features:
-
-- `response_model` in `create` calls that returns a pydantic model
-- `max_retries` in `create` calls that retries the call if it fails by using a backoff strategy
-
-!!! note "Learn More"
-
-    To learn more, please refer to the [docs](../index.md). To understand the benefits of using Pydantic with Instructor, visit the tips and tricks section of the [why use Pydantic](../why.md) page. If you want to check out examples of using Pydantic with Instructor, visit the [examples](../examples/index.md) page.
-
-## llama-cpp-python
-
-Recently llama-cpp-python added support for structured outputs via JSON schema mode. This is a time-saving alternative to extensive prompt engineering and can be used to obtain structured outputs.
-
-In this example we'll cover a more advanced use case of JSON_SCHEMA mode to stream out partial models. To learn more [partial streaming](https://github.com/jxnl/instructor/concepts/partial.md) check out partial streaming.
-
-```python
-import llama_cpp
-from llama_cpp.llama_speculative import LlamaPromptLookupDecoding
-
-import instructor
-
-from pydantic import BaseModel
-from typing import List
-from rich.console import Console
-
-
-llama = llama_cpp.Llama(
-    model_path="../../models/OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf",
-    n_gpu_layers=-1,
-    chat_format="chatml",
-    n_ctx=2048,
-    draft_model=LlamaPromptLookupDecoding(num_pred_tokens=2),  # (1)!
-    logits_all=True,
-    verbose=False,
-)
-
-
-create = instructor.patch(
-    create=llama.create_chat_completion_openai_v1,
-    mode=instructor.Mode.JSON_SCHEMA,  # (2)!
-)
-
-
-text_block = """
-In our recent online meeting, participants from various backgrounds joined to discuss
-the upcoming tech conference. The names and contact details of the participants were as follows:
-
-- Name: John Doe, Email: johndoe@email.com, Twitter: @TechGuru44
-- Name: Jane Smith, Email: janesmith@email.com, Twitter: @DigitalDiva88
-- Name: Alex Johnson, Email: alexj@email.com, Twitter: @CodeMaster2023
-
-During the meeting, we agreed on several key points. The conference will be held on March 15th, 2024,
-at the Grand Tech Arena located at 4521 Innovation Drive. Dr. Emily Johnson, a renowned AI researcher,
-will be our keynote speaker.
-
-The budget for the event is set at $50,000, covering venue costs, speaker fees, and promotional activities.
-Each participant is expected to contribute an article to the conference blog by February 20th.
-
-A follow-up meetingis scheduled for January 25th at 3 PM GMT to finalize the agenda and confirm the list of speakers.
-"""
-
-
-class User(BaseModel):
-    name: str
-    email: str
-    twitter: str
-
-
-class MeetingInfo(BaseModel):
-    users: List[User]
-    date: str
-    location: str
-    budget: int
-    deadline: str
-
-
-extraction_stream = create(
-    response_model=instructor.Partial[MeetingInfo],  # (3)!
-    messages=[
-        {
-            "role": "user",
-            "content": f"Get the information about the meeting and the users {text_block}",
-        },
-    ],
-    stream=True,
-)
-
-
-console = Console()
-
-for extraction in extraction_stream:
-    obj = extraction.model_dump()
-    console.clear()  # (4)!
-    console.print(obj)
-```
-
-We use LlamaPromptLookupDecoding to speed up structured output generation using speculative decoding. The draft model generates candidate tokens during generation 10 is good for GPU, 2 is good for CPU. 2. We use `instructor.Mode.JSON_SCHEMA` return a JSON schema response. 3. We use `instructor.Partial` to stream out partial models. 4. This is just a simple example of how to stream out partial models and clear the console.
-
-![](../img/partial.gif)
diff --git a/docs/hub/mistral.md b/docs/hub/mistral.md
deleted file mode 100644
index d1fd4e0b7..000000000
--- a/docs/hub/mistral.md
+++ /dev/null
@@ -1,73 +0,0 @@
----
-draft: False
-date: 2024-02-26
-slug: mistral
-tags:
-  - patching
-authors:
-  - shanktt
----
-
-# Structured Outputs with Mistral Large
-
-If you want to try this example using `instructor hub`, you can pull it by running
-
-```bash
-instructor hub pull --slug mistral --py > mistral_example.py
-```
-
-Mistral Large is the flagship model from Mistral AI, supporting 32k context windows and functional calling abilities. Mistral Large's addition of [function calling](https://docs.mistral.ai/guides/function-calling/) makes it possible to obtain structured outputs using JSON schema.
-
-By the end of this blog post, you will learn how to effectively utilize Instructor with Mistral Large.
-
-<!-- more -->
-
-## Patching
-
-Instructor's patch enhances the mistral api with the following features:
-
-- `response_model` in `create` calls that returns a pydantic model
-- `max_retries` in `create` calls that retries the call if it fails by using a backoff strategy
-
-!!! note "Learn More"
-
-    To learn more, please refer to the [docs](../index.md). To understand the benefits of using Pydantic with Instructor, visit the tips and tricks section of the [why use Pydantic](../why.md) page.
-
-## Mistral Client
-
-The Mistral client employs a different client than OpenAI, making the patching process slightly different than other examples
-
-!!! note "Getting access"
-
-    If you want to try this out for yourself check out the [Mistral AI](https://mistral.ai/) website. You can get started [here](https://docs.mistral.ai/).
-
-```python
-import instructor
-
-from pydantic import BaseModel
-from mistralai.client import MistralClient
-
-# enables `response_model` in chat call
-client = MistralClient()
-
-patched_chat = instructor.from_openai(create=client.chat, mode=instructor.Mode.MISTRAL_TOOLS)
-
-if __name__ == "__main__":
-
-    class UserDetails(BaseModel):
-        name: str
-        age: int
-
-    resp = patched_chat(
-        model="mistral-large-latest",
-        response_model=UserDetails,
-        messages=[
-            {
-                "role": "user",
-                "content": f'Extract the following entities: "Jason is 20"',
-            },
-        ],
-    )
-    print(resp)
-    #> name='Jason' age=20
-```
diff --git a/docs/hub/vertexai.md b/docs/hub/vertexai.md
deleted file mode 100644
index be8d7c043..000000000
--- a/docs/hub/vertexai.md
+++ /dev/null
@@ -1,257 +0,0 @@
----
-draft: False
-date: 2024-05-30
-slug: vertexai
-tags:
-  - patching
-authors:
-  - ajac-zero
----
-
-# Structured Outputs with Vertex AI
-
-Vertex AI is the recommended way to deploy the Gemini family of models in production. These models support up to 1 million tokens in their context window and boast native multimodality with files, video, and audio. The Vertex AI SDK offers a preview of tool calling that we can use to obtain structured outputs.
-
-By the end of this blog post, you will learn how to effectively utilize Instructor with the Gemini family of models.
-
-<!-- more -->
-
-## Patching
-
-Instructor's patch enhances the gemini api with the following features:
-
-- `response_model` in `create` calls that returns a pydantic model
-- `max_retries` in `create` calls that retries the call if it fails by using a backoff strategy
-
-!!! note "Learn More"
-
-    To learn more, please refer to the [docs](../index.md). To understand the benefits of using Pydantic with Instructor, visit the tips and tricks section of the [why use Pydantic](../why.md) page.
-
-## Vertex AI Client
-
-The Vertex AI client employs a different client than OpenAI, making the patching process slightly different than other examples
-
-!!! note "Getting access"
-
-    If you want to try this out for yourself check out the [Vertex AI](https://cloud.google.com/vertex-ai?hl=en) console. You can get started [here](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform).
-
-```python
-import instructor
-
-from pydantic import BaseModel
-import vertexai.generative_models as gm
-import vertexai
-
-vertexai.init()
-
-client = gm.GenerativeModel("gemini-1.5-pro-preview-0409")
-
-# enables `response_model` in chat call
-client = instructor.from_vertexai(client)
-
-
-if __name__ == "__main__":
-
-    class UserDetails(BaseModel):
-        name: str
-        age: int
-
-    resp = client.create(
-        response_model=UserDetails,
-        messages=[
-            {
-                "role": "user",
-                "content": f'Extract the following entities: "Jason is 20"',
-            },
-        ],
-    )
-    print(resp)
-    #> name='Jason' age=20
-```
-
-### JSON Mode
-
-By default, `instructor.from_vertexai()` uses the mode `instructor.Mode.VERTEXAI_TOOLS`, which means it will use tool calling to create the model response. Alternatively, you can use `instructor.Mode.VERTEXAI_JSON` to use the response_schema parameter provided by the VertexAI SDK. This parameter will prompt Gemini to respond with JSON directly, which can then be parsed into a model response.
-
-If you are not getting good results with tool calling, or prefer this method for any reason, you can switch to this mode:
-
-```python
-### rest of the code as above ...
-
-client = gm.GenerativeModel(
-    "gemini-1.5-pro-preview-0409", mode=instructor.Mode.VERTEXAI_JSON
-)
-
-## rest of the code as above ...
-```
-
-## Limitations
-
-Currently, Vertex AI offers does not support the following attributes from the OpenAPI schema: `optional`, `maximum`, `anyOf`. This means that not all pydantic models will be supported. Below, I'll share some models that could trigger this error and some work-arounds.
-
-### optional / anyOf
-
-Using a pydantic model with an `Optional` field raise an exception, because the Optional type is translated to `"anyOf": [integer , null]` which is not yet supported.
-
-```python
-from typing import Optional
-
-
-class User(BaseModel):
-    name: str
-    age: Optional[int]
-
-
-resp = client.create(
-    messages=[
-        {
-            "role": "user",
-            "content": "Extract Anibal is 23 years old.",
-        }
-    ],
-    response_model=User,
-)
-
-print(resp)
-# ValueError: Protocol message Schema has no "anyOf" field.
-```
-
-A workaround if to set a certain default value that Gemini can fall back on if the information is not present:
-
-```python
-from pydantic import Field
-
-
-class User(BaseModel):
-    name: str
-    age: int = Field(default=0)  # or just age: int = 0
-
-
-resp = client.create(
-    messages=[
-        {
-            "role": "user",
-            "content": "Extract Anibal is _ years old.",
-        }
-    ],
-    response_model=User,
-)
-
-print(resp)
-# name='Anibal' age=0
-```
-
-This workaround can also work with default_factories:
-
-```python
-class User(BaseModel):
-    name: str
-    age: int
-    siblings: list[str] = Field(default_factory=lambda: [])
-
-
-resp = client.create(
-    messages=[
-        {
-            "role": "user",
-            "content": "Extract Anibal is 23 years old.",
-        }
-    ],
-    response_model=User,
-)
-
-print(resp)
-# name='Anibal' age=23 siblings=[]
-```
-
-### maximum
-
-Using the `lt`(less than) or `gt`(greater than) paramateres in a pydantic field will raise exceptions:
-
-
-```python
-class User(BaseModel):
-    name: str
-    age: int = Field(gt=0)
-
-
-resp = client.create(
-    messages=[
-        {
-            "role": "user",
-            "content": "Extract Anibal is 23 years old.",
-        }
-    ],
-    response_model=User,
-)
-
-print(resp)
-# ValueError: Protocol message Schema has no "exclusiveMinimum" field.
-
-
-class User(BaseModel):
-    name: str
-    age: int = Field(lt=100)
-
-
-resp = client.create(
-    messages=[
-        {
-            "role": "user",
-            "content": "Extract Anibal is _ years old.",
-        }
-    ],
-    response_model=User,
-)
-
-print(resp)
-# ValueError: Protocol message Schema has no "exclusiveMaximum" field
-```
-
-A workaround for this is to use pydantic validadors to change these values post creation
-
-```python
-from pydantic import field_validator
-
-
-class User(BaseModel):
-    name: str
-    age: int
-
-    @field_validator("age")
-    def age_range_limit(cls, age: int) -> int:
-        if age > 100:
-            age = 100
-        elif age < 0:
-            age = 0
-        return age
-
-
-resp = client.create(
-    messages=[
-        {
-            "role": "user",
-            "content": "Extract Anibal is 1023 years old.",
-        }
-    ],
-    response_model=User,
-)
-
-print(resp)
-# name='Anibal' age=100
-
-resp = client.create(
-    messages=[
-        {
-            "role": "user",
-            "content": "Extract Anibal is -12 years old.",
-        }
-    ],
-    response_model=User,
-)
-
-print(resp)
-# name='Anibal' age=0
-```
-
-So by relying on pydantic, we can mitigate some of the current limitations with the Gemini models 😊.
diff --git a/docs/hub/youtube_clips.md b/docs/hub/youtube_clips.md
index 719a1f7ce..8721aa245 100644
--- a/docs/hub/youtube_clips.md
+++ b/docs/hub/youtube_clips.md
@@ -11,11 +11,11 @@ If you're interested in trying this example using `instructor hub`, you can pull
 
 
 ```bash
-pip install youtube_transcript_api instructor rich 
+pip install youtube_transcript_api instructor rich
 instructor hub pull --slug youtube-clips --py > youtube_clips.py
 ```
 
-![youtube clip streaming](./img/youtube.gif)
+![youtube clip streaming](../img/youtube.gif)
 
 ```python
 from youtube_transcript_api import YouTubeTranscriptApi
@@ -75,11 +75,11 @@ def yield_clips(segments: Iterable[TranscriptSegment]) -> Iterable[YoutubeClips]
         messages=[
             {
                 "role": "system",
-                "content": """You are given a sequence of YouTube transcripts and your job 
-                is to return notable clips that can be recut as smaller videos. Give very 
-                specific titles and descriptions. Make sure the length of clips is proportional 
-                to the length of the video. Note that this is a transcript and so there might 
-                be spelling errors. Note that and correct any spellings. Use the context to 
+                "content": """You are given a sequence of YouTube transcripts and your job
+                is to return notable clips that can be recut as smaller videos. Give very
+                specific titles and descriptions. Make sure the length of clips is proportional
+                to the length of the video. Note that this is a transcript and so there might
+                be spelling errors. Note that and correct any spellings. Use the context to
                 make sure you're spelling things correctly.""",
             },
             {
@@ -127,4 +127,3 @@ if __name__ == "__main__":
                     str(youtube_clip.end),
                 )
             console.print(table)
-```
diff --git a/docs/hub/img/youtube.gif b/docs/img/youtube.gif
similarity index 100%
rename from docs/hub/img/youtube.gif
rename to docs/img/youtube.gif
diff --git a/docs/index.md b/docs/index.md
index 082c4b697..72f870f2b 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -13,7 +13,7 @@ _Structured outputs powered by llms. Designed for simplicity, transparency, and
 [![Downloads](https://img.shields.io/pypi/dm/instructor.svg)](https://pypi.python.org/pypi/instructor)
 [![GPT](https://img.shields.io/badge/docs-InstructorGPT-blue)](https://chat.openai.com/g/g-EvZweRWrE-instructor-gpt)
 
-Instructor makes it easy to get structured data like JSON from LLMs like GPT-3.5, GPT-4, GPT-4-Vision, and open-source models including [Mistral/Mixtral](./hub/together.md), [Anyscale](./hub/anyscale.md), [Ollama](./hub/ollama.md), and [llama-cpp-python](./hub/llama-cpp-python.md).
+Instructor makes it easy to get structured data like JSON from LLMs like GPT-3.5, GPT-4, GPT-4-Vision, and open-source models including [Mistral/Mixtral](./integrations/together.md), [Ollama](./integrations/ollama.md), and [llama-cpp-python](./integrations/llama-cpp-python.md).
 
 It stands out for its simplicity, transparency, and user-centric design, built on top of Pydantic. Instructor helps you manage [validation context](./concepts/reask_validation.md), retries with [Tenacity](./concepts/retrying.md), and streaming [Lists](./concepts/lists.md) and [Partial](./concepts/partial.md) responses.
 
@@ -69,9 +69,9 @@ Subscribe to our newsletter for updates on AI development. We provide content to
 
 - :material-lightning-bolt: **Simplified LLM Interactions**
 
-    Support for [OpenAI](./hub/openai.md), [Anthropic](./hub/anthropic.md), [Google](./hub/google.md), [Vertex AI](./hub/vertexai.md), [Mistral/Mixtral](./hub/together.md), [Anyscale](./hub/anyscale.md), [Ollama](./hub/ollama.md), [llama-cpp-python](./hub/llama-cpp-python.md), [Cohere](./hub/cohere.md), [LiteLLM](./hub/litellm.md).
+    Support for [OpenAI](./integrations/openai.md), [Anthropic](./integrations/anthropic.md), [Google](./integrations/google.md), [Vertex AI](./integrations/vertex.md), [Mistral/Mixtral](./integrations/together.md), [Ollama](./integrations/ollama.md), [llama-cpp-python](./integrations/llama-cpp-python.md), [Cohere](./integrations/cohere.md), [LiteLLM](./integrations/litellm.md).
 
-    [:octicons-arrow-right-16: See Hub](./hub/index.md)
+    [:octicons-arrow-right-16: See Hub](./integrations/index.md)
 
 </div>
 
@@ -275,7 +275,7 @@ assert resp.age == 25
 
 The Vertex AI and Gemini Clients have different APIs. When using instructor with these clients, make sure to read the documentation for the specific client you're using to make sure you're using the correct methods.
 
-**Note**: Gemini Tool Calling is still in preview, and there are some limitations. You can learn more about them in the [Vertex AI examples notebook](../hub/vertexai.md). As of now, you cannot use tool calling with Gemini when you have multi-modal inputs (Eg. Images, Audio, Video), you must use the `JSON` mode equivalent for that client.
+**Note**: Gemini Tool Calling is still in preview, and there are some limitations. You can learn more about them in the [Vertex AI examples notebook](./integrations/vertex.md). As of now, you cannot use tool calling with Gemini when you have multi-modal inputs (Eg. Images, Audio, Video), you must use the `JSON` mode equivalent for that client.
 
 #### Google AI
 
diff --git a/docs/integrations/anthropic.md b/docs/integrations/anthropic.md
new file mode 100644
index 000000000..5e97308b5
--- /dev/null
+++ b/docs/integrations/anthropic.md
@@ -0,0 +1,243 @@
+---
+title: "Structured outputs with Anthropic, a complete guide w/ instructor"
+description: Learn how to combine Anthropic and Instructor clients to create user models with complex properties in Python.
+---
+
+# Structured outputs with Anthropic, a complete guide w/ instructor
+
+Now that we have a [Anthropic](https://www.anthropic.com/) client, we can use it with the `instructor` client to make requests.
+
+Let's first install the instructor client with anthropic support
+
+```
+pip install "instructor[anthropic]"
+```
+
+Once we've done so, getting started is as simple as using our `from_anthropic` method to patch the client up.
+
+```python
+from pydantic import BaseModel
+from typing import List
+import anthropic
+import instructor
+
+# Patching the Anthropics client with the instructor for enhanced capabilities
+client = instructor.from_anthropic(
+    anthropic.Anthropic(),
+)
+
+
+class Properties(BaseModel):
+    name: str
+    value: str
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    properties: List[Properties]
+
+
+# client.messages.create will also work due to the instructor client
+user_response = client.chat.completions.create(
+    model="claude-3-haiku-20240307",
+    max_tokens=1024,
+    max_retries=0,
+    messages=[
+        {
+            "role": "user",
+            "content": "Create a user for a model with a name, age, and properties.",
+        }
+    ],
+    response_model=User,
+)  # type: ignore
+
+print(user_response.model_dump_json(indent=2))
+"""
+{
+  "name": "John Doe",
+  "age": 35,
+  "properties": [
+    {
+      "name": "City",
+      "value": "New York"
+    },
+    {
+      "name": "Occupation",
+      "value": "Software Engineer"
+    }
+  ]
+}
+"""
+```
+
+## Streaming Support
+
+Instructor has two main ways that you can use to stream responses out
+
+1. **Iterables**: These are useful when you'd like to stream a list of objects of the same type (Eg. use structured outputs to extract multiple users)
+2. **Partial Streaming**: This is useful when you'd like to stream a single object and you'd like to immediately start processing the response as it comes in.
+
+### Partials
+
+You can use our `create_partial` method to stream a single object. Note that validators should not be declared in the response model when streaming objects because it will break the streaming process.
+
+```python
+from instructor import from_anthropic
+import anthropic
+from pydantic import BaseModel
+
+client = from_anthropic(anthropic.Anthropic())
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+
+# Stream partial objects as they're generated
+for partial_user in client.chat.completions.create_partial(
+    model="claude-3-5-haiku-20241022",
+    messages=[
+        {"role": "user", "content": "Create a user profile for Jason, age 25"},
+    ],
+    response_model=User,
+    max_tokens=4096,
+):
+    print(f"Current state: {partial_user}")
+    # > Current state: name='Jason' age=None bio=None
+    # > Current state: name='Jason' age=25 bio='Jason is a 25-year-old with an adventurous spirit and a love for technology. He is'
+    # > Current state: name='Jason' age=25 bio='Jason is a 25-year-old with an adventurous spirit and a love for technology. He is always on the lookout for new challenges and opportunities to grow both personally and professionally.'
+
+```
+
+### Iterable Example
+
+You can also use our `create_iterable` method to stream a list of objects. This is helpful when you'd like to extract multiple instances of the same response model from a single prompt.
+
+```python
+from instructor import from_anthropic
+import anthropic
+from pydantic import BaseModel
+
+client = from_anthropic(anthropic.Anthropic())
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+users = client.chat.completions.create_iterable(
+    model="claude-3-5-haiku-20241022",
+    messages=[
+        {
+            "role": "user",
+            "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """,
+        },
+    ],
+    max_tokens=4096,
+    response_model=User,
+)
+
+for user in users:
+    print(user)
+    #> name='Jason' age=25
+    #> name='Sarah' age=30
+    #> name='Mike' age=28
+```
+
+## Instructor Modes
+
+We provide several modes to make it easy to work with the different response models that Anthropic supports
+
+1. `instructor.Mode.ANTHROPIC_JSON` : This uses the text completion API from the Anthropic API and then extracts out the desired response model from the text completion model
+2. `instructor.Mode.ANTHROPIC_TOOLS` : This uses Anthropic's [tools calling API](https://docs.anthropic.com/en/docs/build-with-claude/tool-use) to return structured outputs to the client
+
+In general, we recommend using `Mode.ANTHROPIC_TOOLS` because it's the best way to ensure you have the desired response schema that you want.
+
+## Caching
+
+If you'd like to use caching with the Anthropic Client, we also support it for images and text input.
+
+### Caching Text Input
+
+Here's how you can implement caching for text input ( assuming you have a giant `book.txt` file that you read in).
+
+We've written a comprehensive walkthrough of how to use caching to implement Anthropic's new Contextual Retrieval method that gives a significant bump to retrieval accuracy.
+
+```python
+from instructor import Instructor, Mode, patch
+from anthropic import Anthropic
+from pydantic import BaseModel
+
+# Set up the client with prompt caching
+client = instructor.from_anthropic(Anthropic())
+
+# Define your Pydantic model
+class Character(BaseModel):
+    name: str
+    description: str
+
+# Load your large context
+with open("./book.txt", "r") as f:
+    book = f.read()
+
+# Make multiple calls using the cached context
+for _ in range(2):
+    resp, completion = client.chat.completions.create_with_completion(
+        model="claude-3-haiku-20240307",
+        messages=[
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "text",
+                        "text": "<book>" + book + "</book>",
+                        "cache_control": {"type": "ephemeral"},
+                    },
+                    {
+                        "type": "text",
+                        "text": "Extract a character from the text given above",
+                    },
+                ],
+            },
+        ],
+        response_model=Character,
+        max_tokens=1000,
+    )
+```
+
+### Caching Images
+
+We also support caching for images. This helps significantly, especially if you're using images repeatedly to save on costs. Read more about it [here](../concepts/caching.md)
+
+```python
+import instructor
+from anthropic import Anthropic
+
+client = instructor.from_anthropic(Anthropic(), enable_prompt_caching=True)
+
+cache_control = {"type": "ephemeral"}
+response = client.chat.completions.create(
+    model="claude-3-haiku-20240307",
+    response_model=ImageAnalyzer,  # This can be set to `None` to return an Anthropic prompt caching message
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                "What is in this two images?",
+                {"type": "image", "source": "https://example.com/image.jpg", "cache_control": cache_control},
+                {"type": "image", "source": "path/to/image.jpg", "cache_control": cache_control},
+            ]
+        }
+    ],
+    autodetect_images=True
+)
+```
diff --git a/docs/integrations/azure.md b/docs/integrations/azure.md
new file mode 100644
index 000000000..c3742f29d
--- /dev/null
+++ b/docs/integrations/azure.md
@@ -0,0 +1,309 @@
+---
+title: Structured outputs with Azure OpenAI, a complete guide w/ instructor
+description: Learn how to use Azure OpenAI with instructor for structured outputs, including async/sync implementations, streaming, and validation.
+---
+
+# Structured Outputs with Azure OpenAI
+
+This guide demonstrates how to use Azure OpenAI with instructor for structured outputs. Azure OpenAI provides the same powerful models as OpenAI but with enterprise-grade security and compliance features through Microsoft Azure.
+
+## Installation
+
+We can use the same installation as we do for OpenAI since the default `openai` client ships with an AzureOpenAI client.
+
+First, install the required dependencies:
+
+```bash
+pip install instructor
+```
+
+Next, make sure that you've enabled Azure OpenAI in your Azure account and have a deployment for the model you'd like to use. [Here is a guide to get started](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal)
+
+Once you've done so, you'll have an endpoint and a API key to be used to configure the client.
+
+```bash
+instructor.exceptions.InstructorRetryException: Error code: 401 - {'statusCode': 401, 'message': 'Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired.'}
+```
+
+If you see an error like the one above, make sure you've set the correct endpoint and API key in the client.
+
+## Authentication
+
+To use Azure OpenAI, you'll need:
+
+1. Azure OpenAI endpoint
+2. API key
+3. Deployment name
+
+```python
+import os
+from openai import AzureOpenAI
+import instructor
+
+# Configure Azure OpenAI client
+client = AzureOpenAI(
+    api_key=os.environ["AZURE_OPENAI_API_KEY"],
+    api_version="2024-02-01",
+    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
+)
+
+# Patch the client with instructor
+client = instructor.from_openai(client)
+```
+
+## Basic Usage
+
+Here's a simple example using a Pydantic model:
+
+```python
+import os
+import instructor
+from openai import AzureOpenAI
+from pydantic import BaseModel
+
+client = AzureOpenAI(
+    api_key=os.environ["AZURE_OPENAI_API_KEY"],
+    api_version="2024-02-01",
+    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+)
+client = instructor.from_openai(client)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+# Synchronous usage
+user = client.chat.completions.create(
+    model="gpt-4o-mini",  # Your deployment name
+    messages=[{"role": "user", "content": "John is 30 years old"}],
+    response_model=User,
+)
+
+print(user)
+# > name='John' age=30
+```
+
+## Async Implementation
+
+Azure OpenAI supports async operations:
+
+```python
+import os
+import instructor
+import asyncio
+from openai import AsyncAzureOpenAI
+from pydantic import BaseModel
+
+client = AsyncAzureOpenAI(
+    api_key=os.environ["AZURE_OPENAI_API_KEY"],
+    api_version="2024-02-15-preview",
+    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+)
+client = instructor.from_openai(client)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+async def get_user_async():
+    return await client.chat.completions.create(
+        model="gpt-4o-mini",
+        messages=[{"role": "user", "content": "John is 30 years old"}],
+        response_model=User,
+    )
+
+
+# Run async function
+user = asyncio.run(get_user_async())
+print(user)
+# > name='John' age=30
+```
+
+## Nested Models
+
+Azure OpenAI handles complex nested structures:
+
+```python
+import os
+import instructor
+from openai import AzureOpenAI
+from pydantic import BaseModel
+
+client = AzureOpenAI(
+    api_key=os.environ["AZURE_OPENAI_API_KEY"],
+    api_version="2024-02-01",
+    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+)
+client = instructor.from_openai(client)
+
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+
+class UserWithAddress(BaseModel):
+    name: str
+    age: int
+    addresses: list[Address]
+
+
+resp = client.chat.completions.create(
+    model="gpt-4o-mini",  # Your deployment name
+    messages=[
+        {
+            "role": "user",
+            "content": """
+        John is 30 years old and has two addresses:
+        1. 123 Main St, New York, USA
+        2. 456 High St, London, UK
+        """,
+        }
+    ],
+    response_model=UserWithAddress,
+)
+
+print(resp)
+# {
+#     'name': 'John',
+#     'age': 30,
+#     'addresses': [
+#         {
+#             'street': '123 Main St',
+#             'city': 'New York',
+#             'country': 'USA'
+#         },
+#         {
+#             'street': '456 High St',
+#             'city': 'London',
+#             'country': 'UK'
+#         }
+#     ]
+# }
+```
+
+## Streaming Support
+
+Instructor has two main ways that you can use to stream responses out
+
+1. **Iterables**: These are useful when you'd like to stream a list of objects of the same type (Eg. use structured outputs to extract multiple users)
+2. **Partial Streaming**: This is useful when you'd like to stream a single object and you'd like to immediately start processing the response as it comes in.
+
+### Partials
+
+You can use our `create_partial` method to stream a single object. Note that validators should not be declared in the response model when streaming objects because it will break the streaming process.
+
+```python
+from instructor import from_openai
+from openai import AzureOpenAI
+from pydantic import BaseModel
+import os
+
+client = from_openai(
+    AzureOpenAI(
+        api_key=os.environ["AZURE_OPENAI_API_KEY"],
+        api_version="2024-02-01",
+        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+    )
+)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+
+# Stream partial objects as they're generated
+user = client.chat.completions.create_partial(
+    model="gpt-4o-mini",
+    messages=[
+        {"role": "user", "content": "Create a user profile for Jason, age 25"},
+    ],
+    response_model=User,
+)
+
+for user_partial in user:
+    print(user_partial)
+
+# > name='Jason' age=None bio='None'
+# > name='Jason' age=25 bio='A tech'
+# > name='Jason' age=25 bio='A tech enthusiast'
+# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new'
+# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new technologies'
+
+```
+
+## Iterable Responses
+
+```python
+from instructor import from_openai
+from openai import AzureOpenAI
+from pydantic import BaseModel
+import os
+
+client = from_openai(
+    AzureOpenAI(
+        api_key=os.environ["AZURE_OPENAI_API_KEY"],
+        api_version="2024-02-01",
+        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+    )
+)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+# Extract multiple users from text
+users = client.chat.completions.create_iterable(
+    model="gpt-4o-mini",
+    messages=[
+        {
+            "role": "user",
+            "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """,
+        },
+    ],
+    response_model=User,
+)
+
+for user in users:
+    print(user)
+#> name='Jason' age=25
+# > name='Sarah' age=30
+# > name='Mike' age=28
+
+```
+
+## Instructor Modes
+
+We provide several modes to make it easy to work with the different response models that OpenAI supports
+
+1. `instructor.Mode.TOOLS` : This uses the [tool calling API](https://platform.openai.com/docs/guides/function-calling) to return structured outputs to the client
+2. `instructor.Mode.JSON` : This forces the model to return JSON by using [OpenAI's JSON mode](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
+3. `instructor.Mode.FUNCTIONS` : This uses OpenAI's function calling API to return structured outputs and will be deprecated in the future.
+4. `instructor.Mode.PARALLEL_TOOLS` : This uses the [parallel tool calling API](https://platform.openai.com/docs/guides/function-calling#configuring-parallel-function-calling) to return structured outputs to the client. This allows the model to generate multiple calls in a single response.
+5. `instructor.Mode.MD_JSON` : This makes a simple call to the OpenAI chat completion API and parses the raw response as JSON.
+6. `instructor.Mode.TOOLS_STRICT` : This uses the new Open AI structured outputs API to return structured outputs to the client using constrained grammar sampling. This restricts users to a subset of the JSON schema.
+7. `instructor.Mode.JSON_O1` : This is a mode for the `O1` model. We created a new mode because `O1` doesn't support any system messages, tool calling or streaming so you need to use this mode to use Instructor with `O1`.
+
+In general, we recommend using `Mode.Tools` because it's the most flexible and future-proof mode. It has the largest set of features that you can specify your schema in and makes things significantly easier to work with.
+
+## Best Practices
+
+## Additional Resources
+
+- [Azure OpenAI Documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/)
+- [Instructor Documentation](https://instructor-ai.github.io/instructor/)
+- [Azure OpenAI Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)
diff --git a/docs/integrations/cerebras.md b/docs/integrations/cerebras.md
new file mode 100644
index 000000000..bfb326fcb
--- /dev/null
+++ b/docs/integrations/cerebras.md
@@ -0,0 +1,246 @@
+---
+title: "Structured outputs with Cerebras, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Cerebras's hardware-accelerated AI models. Learn how to generate structured, type-safe outputs with high-performance computing."
+---
+
+# Structured outputs with Cerebras, a complete guide w/ instructor
+
+Cerebras provides hardware-accelerated AI models optimized for high-performance computing environments. This guide shows you how to use Instructor with Cerebras's models for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Cerebras support:
+
+```bash
+pip install "instructor[cerebras_cloud_sdk]"
+```
+
+## Simple User Example (Sync)
+
+```python
+import instructor
+from cerebras.cloud.sdk import Cerebras
+from pydantic import BaseModel
+
+client = instructor.from_cerebras(Cerebras())
+
+# Enable instructor patches
+client = instructor.from_cerebras(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+resp = client.chat.completions.create(
+    model="llama3.1-70b",
+    messages=[
+        {
+            "role": "user",
+            "content": "Extract the name and age of the person in this sentence: John Smith is 29 years old.",
+        }
+    ],
+    response_model=User,
+)
+
+print(resp)
+#> User(name='John Smith', age=29)
+```
+
+## Simple User Example (Async)
+
+```python
+from cerebras.cloud.sdk import AsyncCerebras
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize async client
+client = AsyncCerebras()
+
+# Enable instructor patches
+client = instructor.from_cerebras(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    resp = await client.chat.completions.create(
+        model="llama3.1-70b",
+        messages=[
+            {
+                "role": "user",
+                "content": "Extract the name and age of the person in this sentence: John Smith is 29 years old.",
+            }
+        ],
+        response_model=User,
+    )
+    return resp
+
+# Run async function
+resp = asyncio.run(extract_user())
+print(resp)
+#> User(name='John Smith', age=29)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+import instructor
+from cerebras.cloud.sdk import Cerebras
+
+client = instructor.from_cerebras(Cerebras())
+
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: list[Address]
+
+
+# Create structured output with nested objects
+user = client.chat.completions.create(
+    messages=[
+        {
+            "role": "user",
+            "content": """
+        Extract: Jason is 25 years old.
+        He lives at 123 Main St, New York, USA
+        and has a summer house at 456 Beach Rd, Miami, USA
+    """,
+        }
+    ],
+    model="llama3.1-70b",
+    response_model=User,
+)
+
+print(user)
+#> {
+#>     'name': 'Jason',
+#>     'age': 25,
+#>     'addresses': [
+#>         {
+#>             'street': '123 Main St',
+#>             'city': 'New York',
+#>             'country': 'USA'
+#>         },
+#>         {
+#>             'street': '456 Beach Rd',
+#>             'city': 'Miami',
+#>             'country': 'USA'
+#>         }
+#>     ]
+#> }
+```
+
+## Streaming Support
+
+Instructor has two main ways that you can use to stream responses out
+
+1. **Iterables**: These are useful when you'd like to stream a list of objects of the same type (Eg. use structured outputs to extract multiple users)
+2. **Partial Streaming**: This is useful when you'd like to stream a single object and you'd like to immediately start processing the response as it comes in.
+
+We currently support partial streaming for Cerebras by parsing the raw text completion. We have not implemented streaming for function calling at this point in time yet. Please make sure you have `mode=instructor.Mode.CEREBRAS_JSON` set when using partial streaming.
+
+```python
+import instructor
+from cerebras.cloud.sdk import Cerebras, AsyncCerebras
+from pydantic import BaseModel
+from typing import Iterable
+
+client = instructor.from_cerebras(Cerebras(), mode=instructor.Mode.CEREBRAS_JSON)
+
+
+class Person(BaseModel):
+    name: str
+    age: int
+
+
+resp = client.chat.completions.create_partial(
+    model="llama3.1-70b",
+    messages=[
+        {
+            "role": "user",
+            "content": "Ivan is 27 and lives in Singapore",
+        }
+    ],
+    response_model=Person,
+    stream=True,
+)
+
+for person in resp:
+    print(person)
+    # > name=None age=None
+    # > name='Ivan' age=None
+    # > name='Ivan' age=27
+
+```
+
+## Iterable Example
+
+```python
+import instructor
+from cerebras.cloud.sdk import Cerebras, AsyncCerebras
+from pydantic import BaseModel
+from typing import Iterable
+
+client = instructor.from_cerebras(Cerebras(), mode=instructor.Mode.CEREBRAS_JSON)
+
+
+class Person(BaseModel):
+    name: str
+    age: int
+
+
+resp = client.chat.completions.create_iterable(
+    model="llama3.1-70b",
+    messages=[
+        {
+            "role": "user",
+            "content": "Extract all users from this sentence : Chris is 27 and lives in San Francisco, John is 30 and lives in New York while their college roomate Jessica is 26 and lives in London",
+        }
+    ],
+    response_model=Person,
+    stream=True,
+)
+
+for person in resp:
+    print(person)
+    # > Person(name='Chris', age=27)
+    # > Person(name='John', age=30)
+    # > Person(name='Jessica', age=26)
+
+```
+
+## Instructor Hooks
+
+Instructor provides several hooks to customize behavior:
+
+### Validation Hook
+
+```python
+from instructor import Instructor
+
+def validation_hook(value, retry_count, exception):
+    print(f"Validation failed {retry_count} times: {exception}")
+    return retry_count < 3  # Retry up to 3 times
+
+instructor.patch(client, validation_hook=validation_hook)
+```
+
+## Instructor Modes
+
+We provide serveral modes to make it easy to work with the different response models that Cerebras Supports
+
+1. `instructor.Mode.CEREBRAS_JSON` : This parses the raw completions as a valid JSON object.
+2. `instructor.Mode.CEREBRAS_TOOLS` : This uses Cerebras's tool calling mode to return structured outputs to the client.
+
+In general, we recommend using `Mode.CEREBRAS_TOOLS` because it's the most flexible and future-proof mode. It has the largest set of features that you can specify your schema in and makes things significantly easier to work with.
diff --git a/docs/hub/cohere.md b/docs/integrations/cohere.md
similarity index 92%
rename from docs/hub/cohere.md
rename to docs/integrations/cohere.md
index f041041bf..2ac80c86a 100644
--- a/docs/hub/cohere.md
+++ b/docs/integrations/cohere.md
@@ -1,9 +1,9 @@
 ---
-title: Using Cohere for Structured Outputs in Python
+title: Structured outputs with Cohere, a complete guide w/ instructor
 description: Learn how to leverage Cohere's command models with Python's instructor library for structured data outputs.
 ---
 
-# Structured Outputs with Cohere
+# Structured outputs with Cohere, a complete guide w/ instructor
 
 If you want to try this example using `instructor hub`, you can pull it by running
 
@@ -16,10 +16,14 @@ You can now use any of the Cohere's [command models](https://docs.cohere.com/doc
 You'll need a cohere API key which can be obtained by signing up [here](https://dashboard.cohere.com/) and gives you [free](https://cohere.com/pricing), rate-limited usage for learning and prototyping.
 
 ## Setup
+
 ```
-pip install cohere
+pip install "instructor[cohere]"
+
 ```
+
 Export your key:
+
 ```
 export CO_API_KEY=<YOUR_COHERE_API_KEY>
 ```
@@ -87,4 +91,4 @@ print(group.model_dump_json(indent=2))
   ]
 }
 """
-```
\ No newline at end of file
+```
diff --git a/docs/integrations/fireworks.md b/docs/integrations/fireworks.md
new file mode 100644
index 000000000..a12edab1d
--- /dev/null
+++ b/docs/integrations/fireworks.md
@@ -0,0 +1,258 @@
+---
+title: "Structured outputs with Fireworks, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Fireworks AI models. Learn how to generate structured, type-safe outputs with high-performance, cost-effective AI capabilities."
+---
+
+# Structured outputs with Fireworks, a complete guide w/ instructor
+
+Fireworks provides efficient and cost-effective AI models with enterprise-grade reliability. This guide shows you how to use Instructor with Fireworks's models for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Fireworks support:
+
+```bash
+pip install "instructor[fireworks-ai]"
+```
+
+## Simple User Example (Sync)
+
+```python
+from fireworks.client import Fireworks
+import instructor
+from pydantic import BaseModel
+
+# Initialize the client
+client = Fireworks()
+
+# Enable instructor patches
+client = instructor.from_fireworks(client)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+# Create structured output
+user = client.chat.completions.create(
+    messages=[
+        {
+            "role": "user",
+            "content": "Extract: Jason is 25 years old",
+        }
+    ],
+    model="accounts/fireworks/models/llama-v3-8b-instruct",
+    response_model=User,
+)
+
+print(user)
+# > User(name='Jason', age=25)
+
+```
+
+## Simple User Example (Async)
+
+```python
+from fireworks.client import AsyncFireworks
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize async client
+client = AsyncFireworks()
+
+# Enable instructor patches
+client = instructor.from_fireworks(client)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+async def extract_user():
+    user = await client.chat.completions.create(
+        messages=[
+            {
+                "role": "user",
+                "content": "Extract: Jason is 25 years old",
+            }
+        ],
+        model="accounts/fireworks/models/llama-v3-8b-instruct",
+        response_model=User,
+    )
+    return user
+
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+
+```
+
+## Nested Example
+
+```python
+from fireworks.client import Fireworks
+import instructor
+from pydantic import BaseModel
+
+
+# Enable instructor patches
+client = instructor.from_fireworks(Fireworks())
+
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: list[Address]
+
+
+# Create structured output with nested objects
+user = client.chat.completions.create(
+    messages=[
+        {
+            "role": "user",
+            "content": """
+                Extract: Jason is 25 years old.
+                He lives at 123 Main St, New York, USA
+                and has a summer house at 456 Beach Rd, Miami, USA
+            """,
+        }
+    ],
+    model="accounts/fireworks/models/llama-v3-8b-instruct",
+    response_model=User,
+)
+
+print(user)
+#> {
+#>     'name': 'Jason',
+#>     'age': 25,
+#>     'addresses': [
+#>         {
+#>             'street': '123 Main St',
+#>             'city': 'New York',
+#>             'country': 'USA'
+#>         },
+#>         {
+#>             'street': '456 Beach Rd',
+#>             'city': 'Miami',
+#>             'country': 'USA'
+#>         }
+#>     ]
+#> }
+```
+
+## Streaming Support
+
+Instructor has two main ways that you can use to stream responses out
+
+1. **Iterables**: These are useful when you'd like to stream a list of objects of the same type (Eg. use structured outputs to extract multiple users)
+2. **Partial Streaming**: This is useful when you'd like to stream a single object and you'd like to immediately start processing the response as it comes in.
+
+### Partial Streaming Example
+
+```python
+from fireworks.client import Fireworks
+import instructor
+from pydantic import BaseModel
+
+
+# Enable instructor patches
+client = instructor.from_fireworks(Fireworks())
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+
+user = client.chat.completions.create_partial(
+    model="accounts/fireworks/models/llama-v3-8b-instruct",
+    messages=[
+        {
+            "role": "user",
+            "content": "Create a user profile for Jason + 1 sentence bio, age 25",
+        },
+    ],
+    response_model=User,
+)
+
+for user_partial in user:
+    print(user_partial)
+    # name=None age=None bio=None
+    # name='Jason' age=None bio=None
+    # name='Jason' age=25 bio="When he's"
+    # name='Jason' age=25 bio="When he's not working as a graphic designer, Jason can usually be found trying out new craft beers or attempting to cook something other than ramen noodles."
+
+```
+
+## Iterable Example
+
+```python
+from fireworks.client import Fireworks
+import instructor
+from pydantic import BaseModel
+
+
+# Enable instructor patches
+client = instructor.from_fireworks(Fireworks())
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+# Extract multiple users from text
+users = client.chat.completions.create_iterable(
+    model="accounts/fireworks/models/llama-v3-8b-instruct",
+    messages=[
+        {
+            "role": "user",
+            "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """,
+        },
+    ],
+    response_model=User,
+)
+
+for user in users:
+    print(user)
+
+    # name='Jason' age=25
+    # name='Sarah' age=30
+    # name='Mike' age=28
+```
+
+## Instructor Modes
+
+We provide several modes to make it easy to work with the different response models that Fireworks supports
+
+1. `instructor.Mode.FIREWORKS_JSON` : This parses the raw text completion into a pydantic object
+2. `instructor.Mode.FIREWORKS_TOOLS` : This uses Fireworks's tool calling API to return structured outputs to the client
+
+## Related Resources
+
+- [Fireworks Documentation](https://docs.fireworks.ai/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Fireworks's latest API versions. Check the [changelog](https://github.com/jxnl/instructor/blob/main/CHANGELOG.md) for updates.
+
+Note: Always verify model-specific features and limitations before implementing streaming functionality in production environments.
diff --git a/docs/integrations/google.md b/docs/integrations/google.md
new file mode 100644
index 000000000..fcc280f0e
--- /dev/null
+++ b/docs/integrations/google.md
@@ -0,0 +1,277 @@
+---
+title: "Structured outputs with Google/Gemini, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Google's Gemini models. Learn how to generate structured, type-safe outputs with Google's advanced AI capabilities."
+---
+
+# Structured outputs with Google/Gemini, a complete guide w/ instructor
+
+This guide will show you how to use Instructor with the Google.GenerativeAI library. We recommend this library for most users as it's significantly easier to get started with.
+
+## Google.GenerativeAI
+
+Google's Gemini models provide powerful AI capabilities with multimodal support. This guide shows you how to use Instructor with Google's Gemini models for type-safe, validated responses.
+
+```bash
+pip install "instructor[google-generativeai]
+```
+
+## Simple User Example (Sync)
+
+```python
+import instructor
+import google.generativeai as genai
+from pydantic import BaseModel
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+client = instructor.from_gemini(
+    client=genai.GenerativeModel(
+        model_name="models/gemini-1.5-flash-latest",
+    ),
+    mode=instructor.Mode.GEMINI_JSON,
+)
+
+# note that client.chat.completions.create will also work
+resp = client.messages.create(
+    messages=[
+        {
+            "role": "user",
+            "content": "Extract Jason is 25 years old.",
+        }
+    ],
+    response_model=User,
+)
+
+print(resp)
+```
+
+## Simple User Example (Async)
+
+!!! info "Async Support"
+
+    Instructor supports async mode for the Google.GenerativeAI library. If you're using the async client, make sure that your client is declared within the same event loop as the function that calls it. If not you'll get a bunch of errors.
+
+```python
+import instructor
+import google.generativeai as genai
+from pydantic import BaseModel
+import asyncio
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+async def extract_user():
+    client = instructor.from_gemini(
+        client=genai.GenerativeModel(
+            model_name="models/gemini-1.5-flash-latest",
+        ),
+        use_async=True,
+    )
+
+    user = await client.chat.completions.create(
+        messages=[
+            {
+                "role": "user",
+                "content": "Extract Jason is 25 years old.",
+            }
+        ],
+        response_model=User,
+    )
+    return user
+
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+
+```
+
+## Nested Example
+
+```python
+import instructor
+import google.generativeai as genai
+from pydantic import BaseModel
+
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: list[Address]
+
+
+client = instructor.from_gemini(
+    client=genai.GenerativeModel(
+        model_name="models/gemini-1.5-flash-latest",
+    ),
+)
+
+user = client.chat.completions.create(
+    messages=[
+        {
+            "role": "user",
+            "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """,
+        },
+    ],
+    response_model=User,
+)
+
+print(user)
+#> {
+#>     'name': 'Jason',
+#>     'age': 25,
+#>     'addresses': [
+#>         {
+#>             'street': '123 Main St',
+#>             'city': 'New York',
+#>             'country': 'USA'
+#>         },
+#>         {
+#>             'street': '456 Beach Rd',
+#>             'city': 'Miami',
+#>             'country': 'USA'
+#>         }
+#>     ]
+#> }
+```
+
+## Streaming Support
+
+Instructor has two main ways that you can use to stream responses out
+
+1. **Iterables**: These are useful when you'd like to stream a list of objects of the same type (Eg. use structured outputs to extract multiple users)
+2. **Partial Streaming**: This is useful when you'd like to stream a single object and you'd like to immediately start processing the response as it comes in.
+
+### Partials
+
+```python
+import instructor
+import google.generativeai as genai
+from pydantic import BaseModel
+
+
+client = instructor.from_gemini(
+    client=genai.GenerativeModel(
+        model_name="models/gemini-1.5-flash-latest",
+    ),
+)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+
+user = client.chat.completions.create_partial(
+    messages=[
+        {
+            "role": "user",
+            "content": "Create a user profile for Jason and 1 sentence bio, age 25",
+        },
+    ],
+    response_model=User,
+)
+
+for user_partial in user:
+    print(user_partial)
+    # > name=None age=None bio=None
+    # > name=None age=25 bio='Jason is a great guy'
+    # > name='Jason' age=25 bio='Jason is a great guy'
+```
+
+### Iterable Example
+
+```python
+import instructor
+import google.generativeai as genai
+from pydantic import BaseModel
+
+
+client = instructor.from_gemini(
+    client=genai.GenerativeModel(
+        model_name="models/gemini-1.5-flash-latest",
+    ),
+)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+# Extract multiple users from text
+users = client.chat.completions.create_iterable(
+    messages=[
+        {
+            "role": "user",
+            "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """,
+        },
+    ],
+    response_model=User,
+)
+
+for user in users:
+    print(user)
+    #> name='Jason' age=25
+    #> name='Sarah' age=30
+    #> name='Mike' age=28
+```
+
+## Instructor Modes
+
+We provide several modes to make it easy to work with the different response models that Gemini supports
+
+1. `instructor.Mode.GEMINI_JSON` : This parses the raw text completion into a pydantic object
+2. `instructor.Mode.GEMINI_TOOLS` : This uses Gemini's tool calling API to return structured outputs to the client
+
+## Available Models
+
+Google offers several Gemini models:
+
+- Gemini Flash (General purpose)
+- Gemini Pro (Multimodal)
+- Gemini Flash-8b (Coming soon)
+
+## Using Gemini's Multimodal Capabilities
+
+We've written an extensive list of guides on how to use gemini's multimodal capabilities with instructor.
+
+- [Using Geminin To Extract Travel Video Recomendations](../blog/posts/multimodal-gemini.md)
+- [Parsing PDFs with Gemini](../blog/posts/chat-with-your-pdf-with-gemini.md)
+- [Generating Citations with Gemini](../blog/posts/generating-pdf-citations.md)
+
+Stay tuned to the blog for more guides on using Gemini with instructor.
+
+## Related Resources
+
+- [Google AI Documentation](https://ai.google.dev/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Google's latest API versions. Check the [changelog](https://github.com/jxnl/instructor/blob/main/CHANGELOG.md) for updates.
diff --git a/docs/integrations/groq.md b/docs/integrations/groq.md
new file mode 100644
index 000000000..0b1840f1a
--- /dev/null
+++ b/docs/integrations/groq.md
@@ -0,0 +1,157 @@
+---
+title: Structured Outputs with Groq AI and Pydantic
+description: Learn how to use Groq AI for structured outputs with Pydantic in Python and enhance API interactions.
+---
+
+# Structured Outputs with Groq AI
+
+If you want to try this example using `instructor hub`, you can pull it by running
+
+```bash
+instructor hub pull --slug groq --py > groq_example.py
+```
+
+you'll need to sign up for an account and get an API key. You can do that [here](https://console.groq.com/docs/quickstart).
+
+```bash
+export GROQ_API_KEY=<your-api-key-here>
+pip install "instructor[groq]"
+```
+
+## Groq AI
+
+Groq supports structured outputs with their new `llama-3-groq-70b-8192-tool-use-preview` model.
+
+### Sync Example
+
+```python
+import os
+from groq import Groq
+import instructor
+from pydantic import BaseModel
+
+# Initialize with API key
+client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+
+# Enable instructor patches for Groq client
+client = instructor.from_groq(client)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+# Create structured output
+user = client.chat.completions.create(
+    model="llama3-groq-70b-8192-tool-use-preview",
+    messages=[
+        {"role": "user", "content": "Extract: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+print(user)
+# > User(name='Jason', age=25)
+```
+
+### Async Example
+
+```python
+import os
+from groq import AsyncGroq
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize with API key
+client = AsyncGroq(api_key=os.getenv("GROQ_API_KEY"))
+
+# Enable instructor patches for Groq client
+client = instructor.from_groq(client)
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+async def extract_user():
+    user = await client.chat.completions.create(
+        model="llama3-groq-70b-8192-tool-use-preview",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    )
+    return user
+
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)
+# > User(name='Jason', age=25)
+
+```
+
+### Nested Object
+
+```python
+import os
+from groq import Groq
+import instructor
+from pydantic import BaseModel
+
+# Initialize with API key
+client = Groq(api_key=os.getenv("GROQ_API_KEY"))
+
+# Enable instructor patches for Groq client
+client = instructor.from_groq(client)
+
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: list[Address]
+
+
+# Create structured output with nested objects
+user = client.chat.completions.create(
+    model="llama3-groq-70b-8192-tool-use-preview",
+    messages=[
+        {
+            "role": "user",
+            "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """,
+        },
+    ],
+    response_model=User,
+)
+
+print(user)
+#> {
+#>     'name': 'Jason',
+#>     'age': 25,
+#>     'addresses': [
+#>         {
+#>             'street': '123 Main St',
+#>             'city': 'New York',
+#>             'country': 'USA'
+#>         },
+#>         {
+#>             'street': '456 Beach Rd',
+#>             'city': 'Miami',
+#>             'country': 'USA'
+#>         }
+#>     ]
+#> }
+```
diff --git a/docs/integrations/index.md b/docs/integrations/index.md
new file mode 100644
index 000000000..e47ce6d2f
--- /dev/null
+++ b/docs/integrations/index.md
@@ -0,0 +1,84 @@
+# Structured Output Integrations
+
+Welcome to the Instructor integrations guide. This section provides detailed information about using structured outputs with various AI model providers.
+
+## Supported Providers
+
+Instructor supports a wide range of AI model providers, each with their own capabilities and features:
+
+### OpenAI-Compatible Models
+- [OpenAI](./openai.md) - GPT-3.5, GPT-4, and other OpenAI models
+- [Azure OpenAI](./azure.md) - Microsoft's Azure-hosted OpenAI models
+
+### Open Source & Self-Hosted Models
+- [Ollama](./ollama.md) - Run open-source models locally
+- [llama-cpp-python](./llama-cpp-python.md) - Python bindings for llama.cpp
+- [Together AI](./together.md) - Host and run open source models
+
+### Cloud AI Providers
+- [Anthropic](./anthropic.md) - Claude and Claude 2 models
+- [Google](./google.md) - PaLM and Gemini models
+- [Vertex AI](./vertex.md) - Google Cloud's AI platform
+- [Cohere](./cohere.md) - Command and other Cohere models
+- [Groq](./groq.md) - High-performance inference platform
+- [Mistral](./mistral.md) - Mistral's hosted models
+- [Fireworks](./fireworks.md) - High-performance model inference
+- [Cerebras](./cerebras.md) - AI accelerator platform
+
+### Model Management
+- [LiteLLM](./litellm.md) - Unified interface for multiple providers
+
+## Features Support Matrix
+
+Not all providers support all features. Here's a quick overview:
+
+| Provider | Streaming | Function Calling | Vision | RAG Support |
+|----------|-----------|------------------|---------|-------------|
+| OpenAI | ✅ | ✅ | ✅ | ✅ |
+| Anthropic | ✅ | ✅ | ✅ | ✅ |
+| Google | ✅ | ✅ | ✅ | ✅ |
+| Vertex AI | ✅ | ✅ | ✅ | ✅ |
+| Cohere | ❌ | ✅ | ❌ | ✅ |
+| Ollama | ✅ | ✅ | ✅ | ✅ |
+| llama-cpp | ✅ | ✅ | ❌ | ✅ |
+| Together | ✅ | ✅ | ❌ | ✅ |
+| Groq | ✅ | ✅ | ❌ | ✅ |
+| Mistral | ✅ | ✅ | ❌ | ✅ |
+| Fireworks | ⚠️ | ✅ | ❌ | ✅ |
+| Cerebras | ❌ | ✅ | ❌ | ✅ |
+| LiteLLM | ⚠️ | ✅ | ⚠️ | ✅ |
+
+Legend:
+- ✅ Full support
+- ⚠️ Limited support (provider/model dependent)
+- ❌ Not supported
+
+## Getting Started
+
+To get started with any provider:
+
+1. Install the required dependencies
+2. Set up your API credentials
+3. Initialize the client with Instructor
+4. Define your Pydantic models
+5. Make API calls with structured outputs
+
+For detailed instructions, click on any provider in the list above.
+
+## Common Concepts
+
+All integrations share some common concepts:
+
+- [Data Validation](../concepts/validation.md)
+- [Streaming Support](../concepts/partial.md)
+- [Model Validation](../concepts/models.md)
+- [Instructor Hooks](../concepts/hooks.md)
+
+## Need Help?
+
+If you need help with a specific integration:
+
+1. Check the provider-specific documentation
+2. Look at the [examples](../examples/index.md)
+3. Check our [GitHub issues](https://github.com/jxnl/instructor/issues)
+4. Join our [Discord community](https://discord.gg/CV8sPM5k5Y)
diff --git a/docs/integrations/litellm.md b/docs/integrations/litellm.md
new file mode 100644
index 000000000..671f5fd25
--- /dev/null
+++ b/docs/integrations/litellm.md
@@ -0,0 +1,85 @@
+---
+title: "Structured outputs with LiteLLM, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with LiteLLM's unified interface. Learn how to generate structured, type-safe outputs across multiple LLM providers."
+---
+
+# Structured outputs with LiteLLM, a complete guide w/ instructor
+
+LiteLLM provides a unified interface for multiple LLM providers, making it easy to switch between different models and providers. This guide shows you how to use Instructor with LiteLLM for type-safe, validated responses across various LLM providers.
+
+## Quick Start
+
+Install Instructor with LiteLLM support:
+
+```bash
+pip install "instructor[litellm]"
+```
+
+## Simple User Example (Sync)
+
+```python
+from litellm import completion
+import instructor
+from pydantic import BaseModel
+
+# Enable instructor patches
+client = instructor.from_litellm(completion)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.completion(
+    model="gpt-3.5-turbo",  # Can use any supported model
+    messages=[
+        {"role": "user", "content": "Extract: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+print(user)  # User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+from litellm import acompletion
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Enable instructor patches for async
+client = instructor.from_litellm(acompletion)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.acompletion(
+        model="gpt-3.5-turbo",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Related Resources
+
+- [LiteLLM Documentation](https://docs.litellm.ai/)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with LiteLLM's latest releases. Check the [changelog](https://github.com/jxnl/instructor/blob/main/CHANGELOG.md) for updates.
+
+Note: Always verify provider-specific features and limitations in their respective documentation before implementation.
diff --git a/docs/integrations/llama-cpp-python.md b/docs/integrations/llama-cpp-python.md
new file mode 100644
index 000000000..767601799
--- /dev/null
+++ b/docs/integrations/llama-cpp-python.md
@@ -0,0 +1,83 @@
+---
+draft: False
+date: 2024-02-12
+slug: llama-cpp-python
+tags:
+  - patching
+authors:
+  - jxnl
+---
+
+# Structured outputs with llama-cpp-python, a complete guide w/ instructor
+
+If you want to try this example using `instructor hub`, you can pull it by running
+
+```bash
+instructor hub pull --slug llama-cpp-python --py > llama_cpp_python_example.py
+```
+
+Open-source LLMS are gaining popularity, and llama-cpp-python has made the `llama-cpp` model available to obtain structured outputs using JSON schema via a mixture of [constrained sampling](https://llama-cpp-python.readthedocs.io/en/latest/#json-schema-mode) and [speculative decoding](https://llama-cpp-python.readthedocs.io/en/latest/#speculative-decoding).
+
+They also support a [OpenAI compatible client](https://llama-cpp-python.readthedocs.io/en/latest/#openai-compatible-web-server), which can be used to obtain structured output as a in process mechanism to avoid any network dependency.
+
+<!-- more -->
+
+## Patching
+
+Instructor's patch enhances an create call it with the following features:
+
+- `response_model` in `create` calls that returns a pydantic model
+- `max_retries` in `create` calls that retries the call if it fails by using a backoff strategy
+
+!!! note "Learn More"
+
+    To learn more, please refer to the [docs](../index.md). To understand the benefits of using Pydantic with Instructor, visit the tips and tricks section of the [why use Pydantic](../why.md) page. If you want to check out examples of using Pydantic with Instructor, visit the [examples](../examples/index.md) page.
+
+## llama-cpp-python
+
+Recently llama-cpp-python added support for structured outputs via JSON schema mode. This is a time-saving alternative to extensive prompt engineering and can be used to obtain structured outputs.
+
+In this example we'll cover a more advanced use case of JSON_SCHEMA mode to stream out partial models. To learn more [partial streaming](https://github.com/jxnl/instructor/concepts/partial.md) check out partial streaming.
+
+```python
+import llama_cpp
+import instructor
+from llama_cpp.llama_speculative import LlamaPromptLookupDecoding
+from pydantic import BaseModel
+
+
+llama = llama_cpp.Llama(
+    model_path="../../models/OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf",
+    n_gpu_layers=-1,
+    chat_format="chatml",
+    n_ctx=2048,
+    draft_model=LlamaPromptLookupDecoding(num_pred_tokens=2),
+    logits_all=True,
+    verbose=False,
+)
+
+
+create = instructor.patch(
+    create=llama.create_chat_completion_openai_v1,
+    mode=instructor.Mode.JSON_SCHEMA,
+)
+
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+
+user = create(
+    messages=[
+        {
+            "role": "user",
+            "content": "Extract `Jason is 30 years old`",
+        }
+    ],
+    response_model=UserDetail,
+)
+
+print(user)
+#> name='Jason' age=30
+```
diff --git a/docs/integrations/mistral.md b/docs/integrations/mistral.md
new file mode 100644
index 000000000..a27ac8702
--- /dev/null
+++ b/docs/integrations/mistral.md
@@ -0,0 +1,53 @@
+---
+draft: False
+date: 2024-02-26
+slug: mistral
+tags:
+  - patching
+authors:
+  - shanktt
+---
+
+# Structured outputs with Mistral, a complete guide w/ instructor
+
+If you want to try this example using `instructor hub`, you can pull it by running
+
+```bash
+instructor hub pull --slug mistral --py > mistral_example.py
+```
+
+Mistral Large is the flagship model from Mistral AI, supporting 32k context windows and functional calling abilities. Mistral Large's addition of [function calling](https://docs.mistral.ai/guides/function-calling/) makes it possible to obtain structured outputs using JSON schema.
+
+By the end of this blog post, you will learn how to effectively utilize Instructor with Mistral Large.
+
+```python
+import os
+from pydantic import BaseModel
+from mistralai import Mistral
+from instructor import from_mistral, Mode
+
+
+class UserDetails(BaseModel):
+    name: str
+    age: int
+
+
+# enables `response_model` in chat call
+client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY"))
+
+instructor_client = from_mistral(
+    client=client,
+    model="mistral-large-latest",
+    mode=Mode.MISTRAL_TOOLS,
+    max_tokens=1000,
+)
+
+resp = instructor_client.messages.create(
+    response_model=UserDetails,
+    messages=[{"role": "user", "content": "Jason is 10"}],
+    temperature=0,
+)
+
+print(resp)
+
+```
diff --git a/docs/hub/ollama.md b/docs/integrations/ollama.md
similarity index 97%
rename from docs/hub/ollama.md
rename to docs/integrations/ollama.md
index 4b499befe..5736e67c5 100644
--- a/docs/hub/ollama.md
+++ b/docs/integrations/ollama.md
@@ -9,7 +9,7 @@ authors:
   - jxnl
 ---
 
-# Structured Outputs with Ollama
+# Structured outputs with Ollama, a complete guide w/ instructor
 
 If you want to try this example using `instructor hub`, you can pull it by running
 
diff --git a/docs/integrations/openai.md b/docs/integrations/openai.md
new file mode 100644
index 000000000..d8dd58ad1
--- /dev/null
+++ b/docs/integrations/openai.md
@@ -0,0 +1,275 @@
+---
+title: "Structured outputs with OpenAI, a complete guide w/ instructor"
+description: "Learn how to use Instructor with OpenAI's models for type-safe, structured outputs. Complete guide with examples and best practices for GPT-4 and other OpenAI models."
+---
+
+# OpenAI Integration with Instructor
+
+OpenAI is the primary integration for Instructor, offering robust support for structured outputs with GPT-3.5, GPT-4, and future models. This guide covers everything you need to know about using OpenAI with Instructor for type-safe, validated responses.
+
+## Quick Start
+
+Instructor comes with support for OpenAI out of the box, so you don't need to install anything extra.
+
+```bash
+pip install "instructor"
+```
+
+⚠️ **Important**: You must set your OpenAI API key before using the client. You can do this in two ways:
+
+1. Set the environment variable:
+
+```bash
+export OPENAI_API_KEY='your-api-key-here'
+```
+
+2. Or provide it directly to the client:
+
+```python
+import os
+from openai import OpenAI
+client = OpenAI(api_key='your-api-key-here')
+```
+
+## Simple User Example (Sync)
+
+```python
+import os
+from openai import OpenAI
+import instructor
+from pydantic import BaseModel
+
+# Initialize with API key
+client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
+
+# Enable instructor patches for OpenAI client
+client = instructor.from_openai(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Create structured output
+user = client.chat.completions.create(
+    model="gpt-4o-mini",
+    messages=[
+        {"role": "user", "content": "Extract: Jason is 25 years old"},
+    ],
+    response_model=User,
+)
+
+print(user)
+#> User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+import os
+from openai import AsyncOpenAI
+import instructor
+from pydantic import BaseModel
+import asyncio
+
+# Initialize with API key
+client = AsyncOpenAI(api_key=os.getenv('OPENAI_API_KEY'))
+
+# Enable instructor patches for async OpenAI client
+client = instructor.from_openai(client)
+
+class User(BaseModel):
+    name: str
+    age: int
+
+async def extract_user():
+    user = await client.chat.completions.create(
+        model="gpt-4-turbo-preview",
+        messages=[
+            {"role": "user", "content": "Extract: Jason is 25 years old"},
+        ],
+        response_model=User,
+    )
+    return user
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)
+#> User(name='Jason', age=25)
+```
+
+## Nested Example
+
+```python
+from pydantic import BaseModel
+from typing import List
+import os
+from openai import OpenAI
+import instructor
+from pydantic import BaseModel
+
+class Address(BaseModel):
+    street: str
+    city: str
+    country: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+
+# Initialize with API key
+client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
+
+# Enable instructor patches for OpenAI client
+client = instructor.from_openai(client)
+# Create structured output with nested objects
+user = client.chat.completions.create(
+    model="gpt-4o-mini",
+    messages=[
+        {"role": "user", "content": """
+            Extract: Jason is 25 years old.
+            He lives at 123 Main St, New York, USA
+            and has a summer house at 456 Beach Rd, Miami, USA
+        """},
+    ],
+    response_model=User,
+)
+
+print(user)
+#> {
+#>     'name': 'Jason',
+#>     'age': 25,
+#>     'addresses': [
+#>         {
+#>             'street': '123 Main St',
+#>             'city': 'New York',
+#>             'country': 'USA'
+#>         },
+#>         {
+#>             'street': '456 Beach Rd',
+#>             'city': 'Miami',
+#>             'country': 'USA'
+#>         }
+#>     ]
+#> }
+```
+
+## Streaming Support
+
+Instructor has two main ways that you can use to stream responses out
+
+1. **Iterables**: These are useful when you'd like to stream a list of objects of the same type (Eg. use structured outputs to extract multiple users)
+2. **Partial Streaming**: This is useful when you'd like to stream a single object and you'd like to immediately start processing the response as it comes in.
+
+### Partials
+
+```python
+from instructor import from_openai
+import openai
+from pydantic import BaseModel
+
+client = from_openai(openai.OpenAI())
+
+
+class User(BaseModel):
+    name: str
+    age: int
+    bio: str
+
+
+user = client.chat.completions.create_partial(
+    model="gpt-4o-mini",
+    messages=[
+        {"role": "user", "content": "Create a user profile for Jason, age 25"},
+    ],
+    response_model=User,
+)
+
+for user_partial in user:
+    print(user_partial)
+
+# > name='Jason' age=None bio='None'
+# > name='Jason' age=25 bio='A tech'
+# > name='Jason' age=25 bio='A tech enthusiast'
+# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new'
+# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new technologies'
+
+```
+
+### Iterable Example
+
+```python
+import os
+from openai import OpenAI
+import instructor
+from pydantic import BaseModel
+
+class User(BaseModel):
+    name: str
+    age: int
+
+# Extract multiple users from text
+users = client.chat.completions.create_iterable(
+    model="gpt-4o-mini",
+    messages=[
+        {"role": "user", "content": """
+            Extract users:
+            1. Jason is 25 years old
+            2. Sarah is 30 years old
+            3. Mike is 28 years old
+        """},
+    ],
+    response_model=User,
+)
+
+for user in users:
+    print(user)
+    #> name='Jason' age=25
+    #> name='Sarah' age=30
+    #> name='Mike' age=28
+```
+
+## Instructor Modes
+
+We provide several modes to make it easy to work with the different response models that OpenAI supports
+
+1. `instructor.Mode.TOOLS` : This uses the [tool calling API](https://platform.openai.com/docs/guides/function-calling) to return structured outputs to the client
+2. `instructor.Mode.JSON` : This forces the model to return JSON by using [OpenAI's JSON mode](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
+3. `instructor.Mode.FUNCTIONS` : This uses OpenAI's function calling API to return structured outputs and will be deprecated in the future.
+4. `instructor.Mode.PARALLEL_TOOLS` : This uses the [parallel tool calling API](https://platform.openai.com/docs/guides/function-calling#configuring-parallel-function-calling) to return structured outputs to the client. This allows the model to generate multiple calls in a single response.
+5. `instructor.Mode.MD_JSON` : This makes a simple call to the OpenAI chat completion API and parses the raw response as JSON.
+6. `instructor.Mode.TOOLS_STRICT` : This uses the new Open AI structured outputs API to return structured outputs to the client using constrained grammar sampling. This restricts users to a subset of the JSON schema.
+7. `instructor.Mode.JSON_O1` : This is a mode for the `O1` model. We created a new mode because `O1` doesn't support any system messages, tool calling or streaming so you need to use this mode to use Instructor with `O1`.
+
+In general, we recommend using `Mode.Tools` because it's the most flexible and future-proof mode. It has the largest set of features that you can specify your schema in and makes things significantly easier to work with.
+
+## Batch API
+
+We also support batching requests using the `create_batch` method. This is helpful if your request is not time sensitive because you'll get a 50% discount on the token cost.
+
+Read more about how to use it [here](../examples/batch_job_oai.md)
+
+## Best Practices
+
+1. **Model Selection** : We recommend using gpt-4o-mini for simpler use cases because it's cheap and works well with a clearly defined objective for structured outputs. When the task is more ambigious, consider upgrading to `4o` or even `O1` depending on your needs
+
+2. **Performance Optimization** : Streaming a response model is faster and should be done from the get-go. This is especially true if you're using a simple response model.
+
+## Common Use Cases
+
+- Data Extraction
+- Form Parsing
+- API Response Structuring
+- Document Analysis
+- Configuration Generation
+
+## Related Resources
+
+- [OpenAI Documentation](https://platform.openai.com/docs)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with the latest OpenAI API versions and models. Check the [changelog](https://github.com/jxnl/instructor/blob/main/CHANGELOG.md) for updates.
diff --git a/docs/hub/together.md b/docs/integrations/together.md
similarity index 97%
rename from docs/hub/together.md
rename to docs/integrations/together.md
index 252179f37..e3d31f0f2 100644
--- a/docs/hub/together.md
+++ b/docs/integrations/together.md
@@ -9,7 +9,7 @@ authors:
   - jxnl
 ---
 
-# Structured Outputs with Together AI
+# Structured outputs with Together AI, a complete guide w/ instructor
 
 If you want to try this example using `instructor hub`, you can pull it by running
 
diff --git a/docs/integrations/vertex.md b/docs/integrations/vertex.md
new file mode 100644
index 000000000..79ead60c2
--- /dev/null
+++ b/docs/integrations/vertex.md
@@ -0,0 +1,107 @@
+---
+title: "Structured outputs with Vertex AI, a complete guide w/ instructor"
+description: "Complete guide to using Instructor with Google Cloud's Vertex AI. Learn how to generate structured, type-safe outputs with enterprise-grade AI capabilities."
+---
+
+# Structured outputs with Vertex AI, a complete guide w/ instructor
+
+Google Cloud's Vertex AI provides enterprise-grade AI capabilities with robust scaling and security features. This guide shows you how to use Instructor with Vertex AI for type-safe, validated responses.
+
+## Quick Start
+
+Install Instructor with Vertex AI support. You can do so by running the command below.
+
+```bash
+pip install "instructor[vertexai]"
+```
+
+## Simple User Example (Sync)
+
+```python
+import instructor
+import vertexai  # type: ignore
+from vertexai.generative_models import GenerativeModel  # type: ignore
+from pydantic import BaseModel
+
+vertexai.init()
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+client = instructor.from_vertexai(
+    client=GenerativeModel("gemini-1.5-pro-preview-0409"),
+    mode=instructor.Mode.VERTEXAI_TOOLS,
+)
+
+# note that client.chat.completions.create will also work
+resp = client.create(
+    messages=[
+        {
+            "role": "user",
+            "content": "Extract Jason is 25 years old.",
+        }
+    ],
+    response_model=User,
+)
+
+print(resp)
+#> User(name='Jason', age=25)
+```
+
+## Simple User Example (Async)
+
+```python
+import instructor
+import vertexai  # type: ignore
+from vertexai.generative_models import GenerativeModel  # type: ignore
+from pydantic import BaseModel
+
+vertexai.init()
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+client = instructor.from_vertexai(
+    client=GenerativeModel("gemini-1.5-pro-preview-0409"),
+    mode=instructor.Mode.VERTEXAI_TOOLS,
+    _async=True,
+)
+
+async def extract_user():
+
+async def extract_user():
+    user = await client.create(
+        messages=[
+            {
+                "role": "user",
+                "content": "Extract Jason is 25 years old.",
+            }
+        ],
+        response_model=User,
+    )
+    return user
+
+
+# Run async function
+user = asyncio.run(extract_user())
+print(user)  # User(name='Jason', age=25)
+```
+
+## Related Resources
+
+- [Vertex AI Documentation](https://cloud.google.com/vertex-ai/docs)
+- [Instructor Core Concepts](../concepts/index.md)
+- [Type Validation Guide](../concepts/validation.md)
+- [Advanced Usage Examples](../examples/index.md)
+
+## Updates and Compatibility
+
+Instructor maintains compatibility with Vertex AI's latest API versions. Check the [changelog](https://github.com/jxnl/instructor/blob/main/CHANGELOG.md) for updates.
+
+Note: Some features like partial streaming may not be available due to API limitations. Always check the latest documentation for feature availability.
diff --git a/docs/prompting/ensembling/usp.md b/docs/prompting/ensembling/usp.md
index ac99b49fb..4c55f8e43 100644
--- a/docs/prompting/ensembling/usp.md
+++ b/docs/prompting/ensembling/usp.md
@@ -2,7 +2,7 @@
 description: "Universal Self Prompting is a technique that aims to use unlabeled data to generate exemplars and a more complicated scoring function to select them."
 ---
 
-Universal Self Prompting is a two stage process similar to [Consistency Based Self Adaptive Prompting (COSP)](/cosp.md). Here is a breakdown of the two stages.
+Universal Self Prompting is a two stage process similar to [Consistency Based Self Adaptive Prompting (COSP)](../few_shot/cosp.md). Here is a breakdown of the two stages.
 
 1. **Generate Examples** : LLMs are prompted to generate a collection of candidate responses using a test dataset
 2. **Answer Query** : We then select a few of these model-generated responses as examples to prompt the LLM to obtain a final prediction.
diff --git a/docs/prompting/few_shot/cosp.md b/docs/prompting/few_shot/cosp.md
new file mode 100644
index 000000000..67c1f9c3b
--- /dev/null
+++ b/docs/prompting/few_shot/cosp.md
@@ -0,0 +1,193 @@
+---
+description: "Consistency Based Self Adaptive Prompting (COSP) is a technique that uses entropy and repetitiveness to select high-quality examples for few-shot learning."
+---
+
+# Consistency Based Self Adaptive Prompting (COSP)
+
+COSP is a technique that aims to improve few-shot learning by selecting high-quality examples based on the consistency and confidence of model responses. This approach helps create more effective prompts by identifying examples that the model can process reliably.
+
+## Overview
+
+The COSP process involves two main stages:
+
+1. **Example Generation**: Generate multiple responses for potential examples
+
+   - Run each example through the model multiple times
+   - Collect responses and confidence scores
+
+2. **Example Selection**: Select the best examples based on entropy and repetitiveness
+   - Calculate entropy of responses to measure consistency
+   - Evaluate repetitiveness to ensure reliability
+
+## How COSP Works
+
+### Stage 1: Example Generation
+
+For each potential example in your dataset:
+
+1. Generate multiple responses (typically 3-5)
+2. Calculate the entropy of these responses
+3. Measure the repetitiveness across responses
+
+```python
+from typing import List
+from pydantic import BaseModel, Field
+import instructor
+from openai import OpenAI
+
+class Response(BaseModel):
+    content: str = Field(description="The model's response to the prompt")
+    confidence: float = Field(description="Confidence score between 0 and 1")
+
+client = instructor.from_openai(OpenAI())
+
+def generate_responses(prompt: str, n: int = 3) -> List[Response]:
+    responses = []
+    for _ in range(n):
+        response = client.chat.completions.create(
+            model="gpt-4",
+            messages=[{"role": "user", "content": prompt}],
+            response_model=Response
+        )
+        responses.append(response)
+    return responses
+```
+
+### Stage 2: Example Selection
+
+Calculate metrics for each example:
+
+1. **Entropy**: Measure response variability
+2. **Repetitiveness**: Check response consistency
+
+```python
+import numpy as np
+from scipy.stats import entropy
+
+def calculate_metrics(responses: List[Response]) -> tuple[float, float]:
+    # Calculate entropy
+    confidences = [r.confidence for r in responses]
+    entropy_score = entropy(confidences)
+
+    # Calculate repetitiveness
+    unique_responses = len(set(r.content for r in responses))
+    repetitiveness = 1 - (unique_responses / len(responses))
+
+    return entropy_score, repetitiveness
+```
+
+## Implementation Example
+
+Here's a complete example of COSP implementation:
+
+```python
+from typing import List, Tuple
+from pydantic import BaseModel, Field
+import instructor
+from openai import OpenAI
+import numpy as np
+from scipy.stats import entropy
+
+class Example(BaseModel):
+    text: str
+    score: float = Field(description="Combined quality score")
+    entropy: float = Field(description="Entropy of responses")
+    repetitiveness: float = Field(description="Repetitiveness of responses")
+
+class COSPSelector:
+    def __init__(self, client: OpenAI, n_samples: int = 3):
+        self.client = instructor.from_openai(client)
+        self.n_samples = n_samples
+
+    def generate_responses(self, prompt: str) -> List[Response]:
+        return [
+            self.client.chat.completions.create(
+                model="gpt-4",
+                messages=[{"role": "user", "content": prompt}],
+                response_model=Response
+            )
+            for _ in range(self.n_samples)
+        ]
+
+    def calculate_metrics(self, responses: List[Response]) -> Tuple[float, float]:
+        confidences = [r.confidence for r in responses]
+        entropy_score = entropy(confidences)
+
+        unique_responses = len(set(r.content for r in responses))
+        repetitiveness = 1 - (unique_responses / len(responses))
+
+        return entropy_score, repetitiveness
+
+    def select_examples(self, candidates: List[str], k: int) -> List[Example]:
+        examples = []
+
+        for text in candidates:
+            responses = self.generate_responses(text)
+            entropy_score, repetitiveness = self.calculate_metrics(responses)
+
+            # Combined score (lower is better)
+            score = entropy_score - repetitiveness
+
+            examples.append(Example(
+                text=text,
+                score=score,
+                entropy=entropy_score,
+                repetitiveness=repetitiveness
+            ))
+
+        # Sort by score (lower is better) and select top k
+        return sorted(examples, key=lambda x: x.score)[:k]
+```
+
+## Usage Example
+
+```python
+# Initialize COSP selector
+client = OpenAI()
+selector = COSPSelector(client)
+
+# Candidate examples
+candidates = [
+    "The quick brown fox jumps over the lazy dog",
+    "Machine learning is a subset of artificial intelligence",
+    "Python is a high-level programming language",
+    # ... more examples
+]
+
+# Select best examples
+best_examples = selector.select_examples(candidates, k=3)
+
+# Use selected examples in your prompt
+selected_texts = [ex.text for ex in best_examples]
+prompt = f"""Use these examples to guide your response:
+
+Examples:
+{chr(10).join(f'- {text}' for text in selected_texts)}
+
+Now, please respond to: [your query here]
+"""
+```
+
+## Benefits of COSP
+
+1. **Improved Consistency**: By selecting examples with low entropy and high repetitiveness
+2. **Better Performance**: More reliable few-shot learning
+3. **Automated Selection**: No manual example curation needed
+4. **Quality Metrics**: Quantifiable measure of example quality
+
+## Limitations
+
+1. **Computational Cost**: Requires multiple API calls per example
+2. **Time Overhead**: Selection process can be slow for large candidate sets
+3. **Model Dependency**: Performance may vary across different models
+
+## Related Techniques
+
+- [Universal Self Prompting (USP)](../ensembling/usp.md)
+- Chain of Thought Prompting
+- Self-Consistency
+
+## References
+
+1. Original COSP Paper: [arXiv:2305.14121](https://arxiv.org/abs/2305.14121)
+2. Related Work: [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171)
diff --git a/docs/prompting/index.md b/docs/prompting/index.md
index 67c056ab8..bf6b93371 100644
--- a/docs/prompting/index.md
+++ b/docs/prompting/index.md
@@ -47,14 +47,14 @@ How do we choose effective examples to include in our prompt?
 
 How do we encourage our model to mimic human-like reasoning?
 
-#### Zero Shot
+## Zero Shot {#zero-shot-1}
 
 1. [Auto-Generate Chain-Of-Thought Examples](thought_generation/chain_of_thought_zero_shot/analogical_prompting.md)
 2. [First Ask a Higher-Level Question](thought_generation/chain_of_thought_zero_shot/step_back_prompting.md)
 3. [Encourage Analysis](thought_generation/chain_of_thought_zero_shot/thread_of_thought.md)
 4. [Encourage Structural Reasoning](thought_generation/chain_of_thought_zero_shot/tab_cot.md)
 
-#### Few Shot
+## Few Shot {#few-shot-1}
 5. [Annotate Only Uncertain Examples](thought_generation/chain_of_thought_few_shot/active_prompt.md)
 6. [Choose Diverse Examples](thought_generation/chain_of_thought_few_shot/auto_cot.md)
 7. [Choose Complex Examples](thought_generation/chain_of_thought_few_shot/complexity_based.md)
diff --git a/mkdocs.yml b/mkdocs.yml
index 38612fd5b..36cde0d2f 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,16 +1,16 @@
 site_name: Instructor
 site_author: Jason Liu
-site_description: A lightweight library for structured outputs with LLMs. 
+site_description: A lightweight library for structured outputs with LLMs.
 repo_name: instructor
 repo_url: https://github.com/jxnl/instructor/
-site_url: https://python.useinstructor.com/ 
+site_url: https://python.useinstructor.com/
 edit_uri: edit/main/docs/
-copyright: Copyright &copy; 2024 Jason Liu 
+copyright: Copyright &copy; 2024 Jason Liu
 theme:
   name: material
   icon:
     repo: fontawesome/brands/github
-    edit: material/pencil 
+    edit: material/pencil
     view: material/eye
     theme:
     admonition:
@@ -55,7 +55,7 @@ theme:
     # - toc.integrate
   palette:
       - scheme: default
-        primary: black 
+        primary: black
         accent: indigo
         toggle:
           icon: material/brightness-7
@@ -99,7 +99,7 @@ markdown_extensions:
   - pymdownx.magiclink:
       normalize_issue_symbols: true
       repo_url_shorthand: true
-      user: jxnl 
+      user: jxnl
       repo: instructor
   - pymdownx.mark
   - pymdownx.smartsymbols
@@ -118,7 +118,7 @@ markdown_extensions:
       custom_checkbox: true
   - pymdownx.arithmatex:
       generic: true
-      
+
 extra_javascript:
   - javascripts/katex.js
   - https://unpkg.com/katex@0/dist/katex.min.js
@@ -127,7 +127,7 @@ extra_javascript:
 extra_css:
   - https://unpkg.com/katex@0/dist/katex.min.css
 nav:
-  - Introduction: 
+  - Introduction:
     - Structured Outputs for LLMs: 'index.md'
     - Why use Instructor?: 'why.md'
     - Help with Instructor: 'help.md'
@@ -135,8 +135,10 @@ nav:
     - Installation: 'installation.md'
     - Contributing: 'contributing.md'
     - Philosophy: 'concepts/philosophy.md'
+    - API Reference: 'api.md'
   - Cookbook:
     - Cookbooks: 'examples/index.md'
+    - "Recursive Schema Examples": 'examples/recursive.md'
     - "Enhancing Text Classification": 'examples/classification.md'
     - "Local Classification with Llama-cpp": 'examples/local_classification.md'
     - "Structured Outputs with Ollama": 'examples/ollama.md'
@@ -160,10 +162,15 @@ nav:
     - "Intelligent Document Segmentation": 'examples/document_segmentation.md'
     - "Structured Output with watsonx.ai": 'examples/watsonx.md'
     - "OpenAI Batch Jobs with Instructor": 'examples/batch_job_oai.md'
+    - "Structured Outputs with Groq": 'examples/groq.md'
+    - "Structured Outputs with Mistral": 'examples/mistral.md'
   - Blog:
     - "blog/index.md"
   - Concepts:
+    - Overview: 'concepts/index.md'
     - Models: 'concepts/models.md'
+    - Lists and Arrays: 'concepts/lists.md'
+    - Prompting: 'concepts/prompting.md'
     - Multimodal : 'concepts/multimodal.md'
     - Retrying: 'concepts/retrying.md'
     - Patching: 'concepts/patching.md'
@@ -175,7 +182,7 @@ nav:
     - Usage Tokens: 'concepts/usage.md'
     - Missing: "concepts/maybe.md"
     - Parallel Tools: 'concepts/parallel.md'
-    - Stream Iterable: "concepts/lists.md"
+    - Stream Iterable: "concepts/iterable.md"
     - Stream Partial: "concepts/partial.md"
     - Raw Response: 'concepts/raw_response.md'
     - FastAPI: 'concepts/fastapi.md'
@@ -184,42 +191,54 @@ nav:
     - Logging: 'concepts/logging.md'
     - Distillation: "concepts/distillation.md"
     - Union: 'concepts/union.md'
+    - Unions: 'concepts/unions.md'
+    - Validation: 'concepts/validation.md'
     - Alias: 'concepts/alias.md'
     - Enums: 'concepts/enums.md'
     - Type Adapter: 'concepts/typeadapter.md'
     - Templating: 'concepts/templating.md'
   - Hub:
     - Introduction to Instructor Hub: 'hub/index.md'
-    - Structured Outputs with Vertex AI: 'hub/vertexai.md'
-    - Structured Outputs with Ollama: 'hub/ollama.md'
-    - Structured Outputs with llama-cpp-python: 'hub/llama-cpp-python.md'
-    - Structured Outputs with Together: 'hub/together.md'
-    - Structured Outputs with Anyscale: 'hub/anyscale.md'
-    - Structured Outputs with Groq: 'hub/groq.md'
-    - Structured Outputs with Mistral: 'hub/mistral.md'
-    - Structured Outputs with Cohere: 'hub/cohere.md'
-    - Classification with Structured Outputs: 'hub/single_classification.md'
-    - Bulk Classification with Structured Outputs: 'hub/multiple_classification.md'
-    - Extracting Tables with Structured Outputs: 'hub/tables_from_vision.md'
-    - Creating Pandas DataFrames with Structured Outputs: 'hub/pandas_df.md'
+    - Classification: 'hub/single_classification.md'
+    - Bulk Classification: 'hub/multiple_classification.md'
+    - Extracting Tables: 'hub/tables_from_vision.md'
+    - Creating Pandas DataFrames: 'hub/pandas_df.md'
     - Bulk Async Classification with LangSmith: 'hub/batch_classification_langsmith.md'
-    - Extracting Action Items with Structured Outputs: 'hub/action_items.md'
+    - Extracting Action Items: 'hub/action_items.md'
     - Implementing Partial Streaming Responses: 'hub/partial_streaming.md'
-    - Extracting Contact Information with Structured Outputs: 'hub/extract_contact_info.md'
-    - Generating Knowledge Graphs with Structured Outputs: 'hub/knowledge_graph.md'
+    - Extracting Contact Information: 'hub/extract_contact_info.md'
+    - Generating Knowledge Graphs: 'hub/knowledge_graph.md'
     - Extracting Relevant Clips from YouTube Videos: "hub/youtube_clips.md"
-    - Building Knowledge Graphs with Structured Outputs: 'tutorials/5-knowledge-graphs.ipynb'
+  - Integrations:
+    - Overview: 'integrations/index.md'
+    - Anthropic: 'integrations/anthropic.md'
+    - Azure OpenAI: 'integrations/azure.md'
+    - Cerebras: 'integrations/cerebras.md'
+    - Cohere: 'integrations/cohere.md'
+    - Fireworks: 'integrations/fireworks.md'
+    - Gemini: 'integrations/google.md'
+    - Groq: 'integrations/groq.md'
+    - LiteLLM: 'integrations/litellm.md'
+    - llama-cpp-python: 'integrations/llama-cpp-python.md'
+    - Mistral: 'integrations/mistral.md'
+    - Ollama: 'integrations/ollama.md'
+    - OpenAI: 'integrations/openai.md'
+    - Together: 'integrations/together.md'
+    - Vertex AI: 'integrations/vertex.md'
   - CLI Reference:
       - "CLI Reference": "cli/index.md"
       - "Finetuning GPT-3.5": "cli/finetune.md"
       - "Usage Tracking": "cli/usage.md"
       - "Batch Jobs": "cli/batch.md"
   - Tutorials:
+    - Overview: 'tutorials/index.md'
     - Tutorials (Notebooks): 'tutorials/1-introduction.ipynb'
     - Tips and Tricks: 'tutorials/2-tips.ipynb'
     - Applications RAG: 'tutorials/3-0-applications-rag.ipynb'
     - Applications RAG - 2: 'tutorials/3-1-validation-rag.ipynb'
     - Validation: 'tutorials/4-validation.ipynb'
+    - Knowledge Graphs: 'tutorials/5-knowledge-graphs.ipynb'
+    - Chain of Density: 'tutorials/6-chain-of-density.ipynb'
     - Synthetic Data Generation: 'tutorials/7-synthetic-data-generation.ipynb'
   - Jobs Board (External):
     - Jobs: 'jobs.md'
@@ -231,7 +250,7 @@ nav:
       - Define A Style: 'prompting/zero_shot/style_prompting.md'
       - Auto-Refine The Prompt: 'prompting/zero_shot/s2a.md'
       - Simulate A Perspective: 'prompting/zero_shot/simtom.md'
-      - Clarify Ambiguous Information: 'prompting/zero_shot/rar.md' 
+      - Clarify Ambiguous Information: 'prompting/zero_shot/rar.md'
       - Ask Model To Repeat Query: 'prompting/zero_shot/re2.md'
       - Generate Follow-Up Questions: 'prompting/zero_shot/self_ask.md'
     - Few-Shot:
@@ -241,6 +260,7 @@ nav:
       - Exemplar Selection:
         - Select Effective Examples: 'prompting/few_shot/exemplar_selection/knn.md'
         - Vote-K: 'prompting/few_shot/exemplar_selection/vote_k.md'
+        - Consistent Based Examples: 'prompting/few_shot/cosp.md'
     - Thought Generation:
       - Chain-Of-Thought (Zero-Shot):
         - Generate Examples First: 'prompting/thought_generation/chain_of_thought_zero_shot/analogical_prompting.md'
@@ -286,12 +306,44 @@ plugins:
   - redirects:
       redirect_maps:
          jobs.md: https://jobs.applied-llms.org/
+         # LLM client redirects
+         hub/ollama.md: integrations/ollama.md
+         hub/llama-cpp-python.md: integrations/llama-cpp-python.md
+         hub/anthropic.md: integrations/anthropic.md
+         hub/azure.md: integrations/azure.md
+         hub/cerebras.md: integrations/cerebras.md
+         hub/cohere.md: integrations/cohere.md
+         hub/fireworks.md: integrations/fireworks.md
+         hub/google.md: integrations/google.md
+         hub/groq.md: integrations/groq.md
+         hub/litellm.md: integrations/litellm.md
+         hub/mistral.md: integrations/mistral.md
+         hub/openai.md: integrations/openai.md
+         hub/together.md: integrations/together.md
+         hub/vertex.md: integrations/vertex.md
+         hub/vertexai.md: integrations/vertex.md  # Handle old vertexai.md references
+         # Legacy hub/clients/ redirects
+         'hub/clients/google.md': 'integrations/google.md'
+         'hub/clients/litellm.md': 'integrations/litellm.md'
+         'hub/clients/ollama.md': 'integrations/ollama.md'
+         'hub/clients/llama-cpp-python.md': 'integrations/llama-cpp-python.md'
+         'hub/clients/anthropic.md': 'integrations/anthropic.md'
+         'hub/clients/azure.md': 'integrations/azure.md'
+         'hub/clients/cerebras.md': 'integrations/cerebras.md'
+         'hub/clients/cohere.md': 'integrations/cohere.md'
+         'hub/clients/fireworks.md': 'integrations/fireworks.md'
+         'hub/clients/groq.md': 'integrations/groq.md'
+         'hub/clients/mistral.md': 'integrations/mistral.md'
+         'hub/clients/openai.md': 'integrations/openai.md'
+         'hub/clients/together.md': 'integrations/together.md'
+         'hub/clients/vertex.md': 'integrations/vertex.md'
+         'hub/clients/vertexai.md': 'integrations/vertex.md'
   - mkdocs-jupyter:
       ignore_h1_titles: true
       execute: false
   - social
   - search:
-      separator: '[\s\u200b\-_,:!=\[\]()"`/]+|\.(?!\d)|&[lg]t;|(?!\b)(?=[A-Z][a-z])'
+      separator: '[\s\u200b\-_,:!=\[\]()"`/]+|\.(?!\b)(?=[A-Z][a-z])'
   - minify:
       minify_html: true
   - mkdocstrings:
@@ -326,7 +378,7 @@ extra:
         - icon: material/emoticon-sad-outline
           name: This page could be improved
           data: 0
-          note: >- 
+          note: >-
             Thanks for your feedback! Help us improve this page by
             using our <a href="https://forms.gle/ijr9Zrcg2QWgKoWs7" target="_blank" rel="noopener">feedback form</a>.
   social: