Merge branch 'main' into hf/1104

instructor-ai · Oct 23, 2024 · df07c39 · df07c39
2 parents e3efd03 + f63e1d0
commit df07c39
Show file tree

Hide file tree

Showing 4 changed files with 450 additions and 3 deletions.
diff --git a/docs/blog/posts/anthropic.md b/docs/blog/posts/anthropic.md
@@ -5,8 +5,7 @@ categories:
 - Anthropic
 comments: true
 date: 2024-03-20
-description: Enhance your projects with the new Anthropic client support, featuring
-  installation guidance and user model creation.
+description: Learn how to integrate Anthropic's powerful language models into your projects using Instructor, with step-by-step guidance on installation, client setup, and creating structured outputs with Pydantic models.
 draft: false
 tags:
 - Anthropic
@@ -16,7 +15,7 @@ tags:
 - LLM Techniques
 ---
 
-# Announcing Anthropic Support
+# Structured Outputs with Anthropic
 
 A special shoutout to [Shreya](https://twitter.com/shreyaw_) for her contributions to the anthropic support. As of now, all features are operational with the exception of streaming support.
 

diff --git a/docs/blog/posts/multimodal-gemini.md b/docs/blog/posts/multimodal-gemini.md
@@ -0,0 +1,215 @@
+---
+authors:
+- ivanleomk
+categories:
+- Gemini
+- Multimodal
+comments: true
+date: 2024-10-23
+description: Learn how to use Google's Gemini model for multimodal structured extraction of YouTube videos, extracting structured recommendations for tourist destinations.
+draft: false
+tags:
+- Gemini
+- Multimodal AI
+- Travel Recommendations
+- Pydantic
+- Python
+---
+
+# Structured Outputs with Multimodal Gemini
+
+In this post, we'll explore how to use Google's Gemini model with Instructor to analyze [travel videos](https://www.youtube.com/watch?v=_R8yhW_H9NQ) and extract structured recommendations. This powerful combination allows us to process multimodal inputs (video) and generate structured outputs using Pydantic models. This post was done in collaboration with [Kino.ai](https://kino.ai), a company that uses instructor to do structured extraction from multimodal inputs to improve search for film makers.
+
+## Setting Up the Environment
+
+First, let's set up our environment with the necessary libraries:
+
+```python
+from pydantic import BaseModel
+import instructor
+import google.generativeai as genai
+```
+
+## Defining Our Data Models
+
+We'll use Pydantic to define our data models for tourist destinations and recommendations:
+
+```python
+class TouristDestination(BaseModel):
+    name: str
+    description: str
+    location: str
+
+class Recommendations(BaseModel):
+    chain_of_thought: str
+    description: str
+    destinations: list[TouristDestination]
+```
+
+## Initializing the Gemini Client
+
+Next, we'll set up our Gemini client using Instructor:
+
+```python
+client = instructor.from_gemini(
+    client=genai.GenerativeModel(
+        model_name="models/gemini-1.5-flash-latest",
+    ),
+)
+```
+
+## Uploading and Processing the Video
+
+To analyze a video, we first need to upload it:
+
+```python
+file = genai.upload_file("./takayama.mp4")
+```
+
+Then, we can process the video and extract recommendations:
+
+```python
+resp = client.chat.completions.create(
+    messages=[
+        {
+            "role": "user",
+            "content": ["What places do they recommend in this video?", file],
+        }
+    ],
+    response_model=Recommendations,
+)
+
+print(resp)
+```
+
+??? note "Expand to see Raw Results"
+
+    ```python
+    Recomendations(
+        chain_of_thought='The video recommends visiting Takayama city, in the Hida Region, Gifu Prefecture. The 
+    video suggests visiting the Miyagawa Morning Market, to try the Sarubobo good luck charms, and to enjoy the 
+    cookie cup espresso, made by Koma Coffee. Then, the video suggests visiting a traditional Japanese Cafe, 
+    called Kissako Katsure, and try their matcha and sweets. Afterwards, the video suggests to visit the Sanmachi 
+    Historic District, where you can find local crafts and delicious foods. The video recommends trying Hida Wagyu
+    beef, at the Kin no Kotte Ushi shop, or to have a sit-down meal at the Kitchen Hida. Finally, the video 
+    recommends visiting Shirakawa-go, a World Heritage Site in Gifu Prefecture.',
+        description='This video recommends a number of places to visit in Takayama city, in the Hida Region, Gifu 
+    Prefecture. It shows some of the local street food and highlights some of the unique shops and restaurants in 
+    the area.',
+        destinations=[
+            TouristDestination(
+                name='Takayama',
+                description='Takayama is a city at the base of the Japan Alps, located in the Hida Region of 
+    Gifu.',
+                location='Hida Region, Gifu Prefecture'
+            ),
+            TouristDestination(
+                name='Miyagawa Morning Market',
+                description="The Miyagawa Morning Market, or the Miyagawa Asai-chi in Japanese, is a market that 
+    has existed officially since the Edo Period, more than 100 years ago. It's open every single day, rain or 
+    shine, from 7am to noon.",
+                location='Hida Takayama'
+            ),
+            TouristDestination(
+                name='Nakaya - Handmade Hida Sarubobo',
+                description='The Nakaya shop sells handcrafted Sarubobo good luck charms.',
+                location='Hida Takayama'
+            ),
+            TouristDestination(
+                name='Koma Coffee',
+                description="Koma Coffee is a shop that has been in business for about 50 or 60 years, and they 
+    serve coffee in a cookie cup. They've been serving coffee for about 10 years.",
+                location='Hida Takayama'
+            ),
+            TouristDestination(
+                name='Kissako Katsure',
+                description='Kissako Katsure is a traditional Japanese style cafe, called Kissako, and the name 
+    means would you like to have some tea. They have a variety of teas and sweets.',
+                location='Hida Takayama'
+            ),
+            TouristDestination(
+                name='Sanmachi Historic District',
+                description='Sanmachi Dori is a Historic Merchant District in Takayama, all of the buildings here 
+    have been preserved to look as they did in the Edo Period.',
+                location='Hida Takayama'
+            ),
+            TouristDestination(
+                name='Suwa Orchard',
+                description='The Suwa Orchard has been in business for more than 50 years.',
+                location='Hida Takayama'
+            ),
+            TouristDestination(
+                name='Kitchen HIDA',
+                description='Kitchen HIDA is a restaurant with a 50 year history, known for their Hida Beef dishes
+    and for using a lot of local ingredients.',
+                location='Hida Takayama'
+            ),
+            TouristDestination(
+                name='Kin no Kotte Ushi',
+                description='Kin no Kotte Ushi is a shop known for selling Beef Sushi, especially Hida Wagyu Beef 
+    Sushi. Their sushi is medium rare.',
+                location='Hida Takayama'
+            ),
+            TouristDestination(
+                name='Shirakawa-go',
+                description='Shirakawa-go is a World Heritage Site in Gifu Prefecture.',
+                location='Gifu Prefecture'
+            )
+        ]
+    )
+    ```
+
+The Gemini model analyzes the video and provides structured recommendations. Here's a summary of the extracted information:
+
+1. **Takayama City**: The main destination, located in the Hida Region of Gifu Prefecture.
+2. **Miyagawa Morning Market**: A historic market open daily from 7am to noon.
+3. **Nakaya Shop**: Sells handcrafted Sarubobo good luck charms.
+4. **Koma Coffee**: A 50-60 year old shop famous for serving coffee in cookie cups.
+5. **Kissako Katsure**: A traditional Japanese cafe offering various teas and sweets.
+6. **Sanmachi Historic District**: A preserved merchant district from the Edo Period.
+7. **Suwa Orchard**: A 50+ year old orchard business.
+8. **Kitchen HIDA**: A restaurant with a 50-year history, known for Hida Beef dishes.
+9. **Kin no Kotte Ushi**: A shop specializing in Hida Wagyu Beef Sushi.
+10. **Shirakawa-go**: A World Heritage Site in Gifu Prefecture.
+
+## Limitations, Challenges, and Future Directions
+
+While the current approach demonstrates the power of multimodal AI for video analysis, there are several limitations and challenges to consider:
+
+1. **Lack of Temporal Information**: Our current method extracts overall recommendations but doesn't provide timestamps for specific mentions. This limits the ability to link recommendations to exact moments in the video.
+
+2. **Speaker Diarization**: The model doesn't distinguish between different speakers in the video. Implementing speaker diarization could provide valuable context about who is making specific recommendations.
+
+3. **Content Density**: Longer or more complex videos might overwhelm the model, potentially leading to missed information or less accurate extractions.
+
+### Future Explorations
+
+To address these limitations and expand the capabilities of our video analysis system, here are some promising areas to explore:
+
+1. **Timestamp Extraction**: Enhance the model to provide timestamps for each recommendation or point of interest mentioned in the video. This could be achieved by:
+
+   ```python
+   class TimestampedRecommendation(BaseModel):
+       timestamp: str
+       timestamp_format: Literal["HH:MM", "HH:MM:SS"] # Helps with parsing
+       recommendation: str
+
+   class EnhancedRecommendations(BaseModel):
+       destinations: list[TouristDestination]
+       timestamped_mentions: list[TimestampedRecommendation]
+   ```
+
+2. **Speaker Diarization**: Implement speaker recognition to attribute recommendations to specific individuals. This could be particularly useful for videos featuring multiple hosts or interviewees.
+
+3. **Segment-based Analysis**: Process longer videos in segments to maintain accuracy and capture all relevant information. This approach could involve:
+   - Splitting the video into smaller chunks
+   - Analyzing each chunk separately
+   - Aggregating and deduplicating results
+
+4. **Multi-language Support**: Extend the model's capabilities to accurately analyze videos in various languages and capture culturally specific recommendations.
+
+5. **Visual Element Analysis**: Enhance the model to recognize and describe visual elements like landmarks, food dishes, or activities shown in the video, even if not explicitly mentioned in the audio.
+
+6. **Sentiment Analysis**: Incorporate sentiment analysis to gauge the speaker's enthusiasm or reservations about specific recommendations.
+
+By addressing these challenges and exploring these new directions, we can create a more comprehensive and nuanced video analysis system, opening up even more possibilities for applications in travel, education, and beyond.
diff --git a/docs/blog/posts/structured-output-anthropic.md b/docs/blog/posts/structured-output-anthropic.md
@@ -0,0 +1,137 @@
+---
+authors:
+- jxnl
+categories:
+- Anthropic
+comments: true
+date: 2024-10-23
+description: Learn how to leverage Anthropic's Claude with Instructor for structured outputs and prompt caching, enhancing AI application development.
+draft: false
+tags:
+- Anthropic
+- API Development
+- Pydantic
+- Python
+- LLM Techniques
+- Prompt Caching
+---
+
+# Structured Outputs and Prompt Caching with Anthropic
+
+Anthropic's ecosystem now offers two powerful features for AI developers: structured outputs and prompt caching. These advancements enable more efficient use of large language models (LLMs). This guide demonstrates how to leverage these features with the Instructor library to enhance your AI applications.
+
+## Structured Outputs with Anthropic and Instructor
+
+Instructor now offers seamless integration with Anthropic's powerful language models, allowing developers to easily create structured outputs using Pydantic models. This integration simplifies the process of extracting specific information from AI-generated responses.
+
+To get started, you'll need to install Instructor with Anthropic support:
+
+```bash
+pip install instructor[anthropic]
+```
+
+Here's a basic example of how to use Instructor with Anthropic:
+
+```python
+from pydantic import BaseModel
+from typing import List
+import anthropic
+import instructor
+
+# Patch the Anthropic client with Instructor
+anthropic_client = instructor.from_anthropic(
+    create=anthropic.Anthropic()
+)
+
+# Define your Pydantic models
+class Properties(BaseModel):
+    name: str
+    value: str
+
+class User(BaseModel):
+    name: str
+    age: int
+    properties: List[Properties]
+
+# Use the patched client to generate structured output
+user_response = anthropic_client(
+    model="claude-3-haiku-20240307",
+    max_tokens=1024,
+    messages=[
+        {
+            "role": "user",
+            "content": "Create a user for a model with a name, age, and properties.",
+        }
+    ],
+    response_model=User,
+)
+
+print(user_response.model_dump_json(indent=2))
+"""
+{
+  "name": "John Doe",
+  "age": 30,
+  "properties": [
+    { "name": "favorite_color", "value": "blue" }
+  ]
+}
+"""
+```
+
+This approach allows you to easily extract structured data from Claude's responses, making it simpler to integrate AI-generated content into your applications.
+
+## Prompt Caching: Boosting Performance and Reducing Costs
+
+Anthropic has introduced a new prompt caching feature that can significantly improve response times and reduce costs for applications dealing with large context windows. This feature is particularly useful when making multiple calls with similar large contexts over time.
+
+Here's how you can implement prompt caching with Instructor and Anthropic:
+
+```python
+from instructor import Instructor, Mode, patch
+from anthropic import Anthropic
+from pydantic import BaseModel
+
+# Set up the client with prompt caching
+client = instructor.from_anthropic(Anthropic())
+
+# Define your Pydantic model
+class Character(BaseModel):
+    name: str
+    description: str
+
+# Load your large context
+with open("./book.txt", "r") as f:
+    book = f.read()
+
+# Make multiple calls using the cached context
+for _ in range(2):
+    resp, completion = client.chat.completions.create_with_completion(
+        model="claude-3-haiku-20240307",
+        messages=[
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "text",
+                        "text": "<book>" + book + "</book>",
+                        "cache_control": {"type": "ephemeral"},
+                    },
+                    {
+                        "type": "text",
+                        "text": "Extract a character from the text given above",
+                    },
+                ],
+            },
+        ],
+        response_model=Character,
+        max_tokens=1000,
+    )
+```
+
+In this example, the large context (the book content) is cached after the first request and reused in subsequent requests. This can lead to significant time and cost savings, especially when working with extensive context windows.
+
+## Conclusion
+
+By combining Anthropic's Claude with Instructor's structured output capabilities and leveraging prompt caching, developers can create more efficient, cost-effective, and powerful AI applications. These features open up new possibilities for building sophisticated AI systems that can handle complex tasks with ease.
+
+As the AI landscape continues to evolve, staying up-to-date with the latest tools and techniques is crucial. We encourage you to explore these features and share your experiences with the community. Happy coding!