Welcome to the Guide to Deploying AI Agents with Triton Inference Server. This repository provides a set of tutorials designed to help you deploy AI agents efficiently using the Triton Inference Server. This guide is intended for users who are already familiar with the basics of Triton and are looking to expand their knowledge.
For beginners, we recommend starting with the Conceptual Guide, which covers foundational concepts and basic setup of Triton Inference Server.
Modern large language models (LLMs) are integral components of AI agents — sophisticated self-governing systems that make decisions by interacting with their environment and analyzing the data they gather. By integrating LLMs, AI agents can understand, generate, and respond to human language with high proficiency, enabling them to perform complex tasks such as language translation, content generation, and conversational interactions.
- Constrained Decoding
- Learn about constrained decoding, how to implement it in Triton, and explore practical examples and use cases.
- Function Calling
- Discover how to set up and utilize function calling within AI models using Triton. This section includes detailed instructions and examples to help you integrate function calling into your deployments.