-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8b8d9b7
commit 0f180c0
Showing
11 changed files
with
54 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,16 @@ | ||
# Introduction | ||
|
||
This handbook aims to provide a pragmatic guide to LLMOps. It provides a sufficient understanding of [Large Language Models](/general-concepts/large-language-model), [deployment](/deployments) techniques, and [software engineering](/application-layer) practices to maintain the entire stack. | ||
|
||
It assumes you are interested in self-hosting open source [Large Language Models](/general-concepts/large-language-model). If you only want to use them through HTTP APIs, you can jump straight to the [application layer](/application-layer) best practices. | ||
|
||
## What is LLMOps? | ||
|
||
## Self-Hosted vs Third Party | ||
`LLMOps` is a set of practices that deal with the deployment, maintenance and scaling of [Large Language Models](/general-concepts/large-language-model). If you want to consider yourself an `LLMOps` practitioner, you should be able to, at minimum, be able to deploy and maintain a scalable setup of multiple running LLM instances. | ||
|
||
## New Class of Opportunities, New Class of Problems | ||
|
||
```mermaid | ||
graph TD; | ||
A-->B; | ||
A-->C; | ||
B-->D; | ||
C-->D; | ||
``` | ||
Although there has been a recent trend of naming everything `*Ops` (`DevOps`, `Product Ops`, `MLOps`, `LLMOps`, `BizOps`, etc.), I think `LLMOps` and `MLOps` truly deserve their place as a standalone set of practices. | ||
|
||
The issues they deal with is bridging the gap between the applications and AI models deployed in the infrastructure. They also deal with very specific set of issues arising from using GPUs and TPUs and the primary stress being [Input/Output](/general-concepts/input-output) optimizations. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Application Layer | ||
|
||
This chapter is not strictly related to LLMOps, but discussing the best practices for architecting and developing applications that use them would be a good idea. | ||
|
||
Those applications have to deal with some issues that are not typically met in traditional web development, primarily long-running HTTP requests or MLOps - using custom models for inference. | ||
|
||
Up until [Large Language Models](/general-concepts/large-language-model) became mainstream and in demand by a variety of applications, the issue of dealing with long-running requests was much less prevalent. Typically, due to functional requirements, all the microservice requests normally would take 10ms or less, while waiting for a [Large Language Models](/general-concepts/large-language-model) to complete the inference can take multiple seconds. | ||
|
||
That calls for some adjustments in the application architecture, non-blocking [Input/Output](/general-concepts/input-output) and asynchronous programming. | ||
|
||
This is where asynchronous programming languages shine, like Python with its `asyncio` library or Rust with its `tokio` library, Go with its goroutines, etc. | ||
|
||
Programming languages like `PHP`, which are synchronous by default, might struggle unless supplemented by extensions like [Swoole](https://swoole.com/) (which essentially gives PHP Go-like coroutines) or libraries like [AMPHP](https://amphp.org/). Introducing support for asynchronous programming in PHP can be a challenge, but it is possible. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Deployments |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# llama.cpp | ||
|
||
Llama.cpp is a production ready, open source runner for various [Large Language Models](/general-concepts/large-language-model). | ||
|
||
It has an excellent built-in [server](https://github.com/ggerganov/llama.cpp/tree/master/examples/server) with HTTP API. | ||
|
||
In this handbook we will make the most use of [Continuous Batching](/general-concepts/continuous-batching), which in practice allows handling paralell requests. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Continuous Batching |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Input/Output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Large Language Model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.