-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #240 from janhq/Hardware
docs: initial hardware content
- Loading branch information
Showing
20 changed files
with
738 additions
and
216 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,19 @@ | ||
--- | ||
title: "@janhq: 2x4090 Workstation" | ||
--- | ||
--- | ||
|
||
![Jan-Workstation](https://media.discordapp.net/attachments/964896173401976932/1158437407675387964/Jan-workstation_812x520_via_10015_io.png?ex=651c3e68&is=651aece8&hm=e2548dd8ee20f9ecbc5d13bec7040d00b6e91cb055e5d0fad33a1e232d275caf&=&width=668&height=428) | ||
|
||
## This is Jan 2x4090 Workstation setup components list: | ||
|
||
| Type | Item | Price | | ||
| :------------------- | :----------------------------------------------- | :------ | | ||
| **CPU** | [RYZEN THREADDRIPPER PRO 5965WX 280W SP3 WOF](#) | $2,229 | | ||
| **Motherboard** | [ASUS PRO WS WRX80E SAGE SE WIFI](#) | $933 | | ||
| **GPU** | [ASUS STRIX RTX 4090 24GB OC](#) | $4,345 | | ||
| **RAM** | [G.SKILL RIPJAW S5 2x32 6000C32](#) | $92.99 | | ||
| **Storage PCIe-SSD** | [SAMSUNG 990 PRO 2TB NVME 2.0](#) | $134.99 | | ||
| **Cooler** | [BEQUIET DARK ROCK 4 PRO TR4](#) | $89.90 | | ||
| **Power Supply** | [FSP CANNON 2000W PRO 92+ FULL MODULAR PSU](#) | $449.99 | | ||
| **Case** | [VEDDHA 6GPUS FRAME BLACK](#) | $59.99 | | ||
| **Total cost** | | $8334 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,4 @@ | ||
--- | ||
sidebar_position: 1 | ||
title: Hardware | ||
title: Introduction | ||
--- | ||
|
||
TODO |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
--- | ||
title: Cloud vs. Self-hosting Your AI | ||
--- | ||
|
||
The choice of how to run your AI - on GPU cloud services, on-prem, or just using an API provider - involves various trade-offs. The following is a naive exploration of the pros and cons of renting vs self-hosting. | ||
|
||
## Cost Comparison | ||
|
||
The following estimations use these general assumptions: | ||
|
||
| | Self-Hosted | GPT 4.0 | GPU Rental | | ||
| ---------- | ---------------------------------------- | -------------- | ------------------ | | ||
| Unit Costs | $10k upfront for 2x4090s (5 year amort.) | $0.00012/token | $4.42 for 1xH100/h | | ||
|
||
- 800 average tokens (input & output) in a single request | ||
- Inference speed is at 24 tokens per second | ||
|
||
### Low Usage | ||
|
||
When operating at low capacity: | ||
|
||
| | Self-Hosted | GPT 4.0 | GPU Rental | | ||
| ---------------- | ----------- | ------- | ---------- | | ||
| Cost per Request | $2.33 | $0.10 | $0.04 | | ||
|
||
### High Usage | ||
|
||
When operating at high capacity, i.e. 24 hours in a day, ~77.8k requests per month: | ||
|
||
| | Self-Hosted | GPT 4.0 | GPU Rental | | ||
| -------------- | ------------ | ------- | ---------- | | ||
| Cost per Month | $166 (fixed) | $7465 | $3182 | | ||
|
||
### Incremental Costs | ||
|
||
Large context use cases are also interesting to evaluate. For example, if you had to write a 500 word essay summarizing Tolstoy's "War and Peace": | ||
|
||
| | Self-Hosted | GPT 4.0 | GPU Rental | | ||
| ----------------------- | -------------------- | ------- | ---------- | | ||
| Cost of "War and Peace" | (upfront fixed cost) | $94 | $40 | | ||
|
||
> **Takeaway**: Renting on cloud or using an API is great for initially scaling. However, it can quickly become expensive when dealing with large datasets and context windows. For predictable costs, self-hosting is an attractive option. | ||
## Business Considerations | ||
|
||
Other business level considerations may include: | ||
|
||
| | Self-Hosted | GPT 4.0 | GPU Rental | | ||
| ----------------------- | ----------- | ------- | ---------- | | ||
| Data Privacy | ✅ | ❌ | ❌ | | ||
| Offline Mode | ✅ | ❌ | ❌ | | ||
| Customization & Control | ✅ | ❌ | ✅ | | ||
| Auditing | ✅ | ❌ | ✅ | | ||
| Setup Complexity | ❌ | ✅ | ✅ | | ||
| Setup Cost | ❌ | ✅ | ✅ | | ||
| Maintenance | ❌ | ✅ | ❌ | | ||
|
||
## Conclusion | ||
|
||
The decision to run LLMs in the cloud or on in-house servers is not one-size-fits-all. It depends on your business's specific needs, budget, and security considerations. Cloud-based LLMs offer scalability and cost-efficiency but come with potential security concerns, while in-house servers provide greater control, customization, and cost predictability. | ||
|
||
In some situations, using a mix of cloud and in-house resources can be the best way to go. Businesses need to assess their needs and assets carefully to pick the right method for using LLMs in the ever-changing world of AI technology. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,14 @@ | ||
--- | ||
title: CPU vs. GPU | ||
--- | ||
title: GPU vs CPU What's the Difference? | ||
--- | ||
|
||
## CPU vs. GPU | ||
|
||
| | CPU | GPU | | ||
| ------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------- | | ||
| **Function** | Generalized component that handles main processing functions of a server | Specialized component that excels at parallel computing | | ||
| **Processing** | Designed for serial instruction processing | Designed for parallel instruction processing | | ||
| **Design** | Fewer, more powerful cores | More cores than CPUs, but less powerful than CPU cores | | ||
| **Best suited for** | General-purpose computing applications | High-performance computing applications | | ||
|
||
![CPU VS GPU](https://media.discordapp.net/attachments/964896173401976932/1157998193741660222/CPU-vs-GPU-rendering.png?ex=651aa55b&is=651953db&hm=a22c80ed108a0d25106a20aa25236f7d0fa74167a50788194470f57ce7f4a6ca&=&width=807&height=426) |
Oops, something went wrong.