Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created DEFAULT_NUM_CTX VAR with a default of 32768 #328

Merged
merged 3 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,12 @@ LMSTUDIO_API_BASE_URL=
XAI_API_KEY=

# Include this environment variable if you want more logging for debugging locally
VITE_LOG_LEVEL=debug
VITE_LOG_LEVEL=debug

# Example Context Values for qwen2.5-coder:32b
#
# DEFAULT_NUM_CTX=32768 # Consumes 36GB of VRAM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also just leave an empty uncommented entry here as well, to avoid any issues with process.env missing the variable so we could call it out in errors if necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this will need to live anywhere else that the env vars are being utilized, Dockerfile and docker-compose files for example. Other than that this looks good to me.

# DEFAULT_NUM_CTX=24576 # Consumes 32GB of VRAM
# DEFAULT_NUM_CTX=12288 # Consumes 26GB of VRAM
# DEFAULT_NUM_CTX=6144 # Consumes 24GB of VRAM
DEFAULT_NUM_CTX=
16 changes: 16 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Contributing to Bolt.new Fork
## DEFAULT_NUM_CTX

The `DEFAULT_NUM_CTX` environment variable can be used to limit the maximum number of context values used by the qwen2.5-coder model. For example, to limit the context to 24576 values (which uses 32GB of VRAM), set `DEFAULT_NUM_CTX=24576` in your `.env.local` file.

First off, thank you for considering contributing to Bolt.new! This fork aims to expand the capabilities of the original project by integrating multiple LLM providers and enhancing functionality. Every contribution helps make Bolt.new a better tool for developers worldwide.

Expand Down Expand Up @@ -81,6 +84,19 @@ ANTHROPIC_API_KEY=XXX
```bash
VITE_LOG_LEVEL=debug
```

- Optionally set context size:
```bash
DEFAULT_NUM_CTX=32768
```

Some Example Context Values for the qwen2.5-coder:32b models are.

* DEFAULT_NUM_CTX=32768 - Consumes 36GB of VRAM
* DEFAULT_NUM_CTX=24576 - Consumes 32GB of VRAM
* DEFAULT_NUM_CTX=12288 - Consumes 26GB of VRAM
* DEFAULT_NUM_CTX=6144 - Consumes 24GB of VRAM

**Important**: Never commit your `.env.local` file to version control. It's already included in .gitignore.

### 🚀 Running the Development Server
Expand Down
8 changes: 6 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ARG OPEN_ROUTER_API_KEY
ARG GOOGLE_GENERATIVE_AI_API_KEY
ARG OLLAMA_API_BASE_URL
ARG VITE_LOG_LEVEL=debug
ARG DEFAULT_NUM_CTX

ENV WRANGLER_SEND_METRICS=false \
GROQ_API_KEY=${GROQ_API_KEY} \
Expand All @@ -35,7 +36,8 @@ ENV WRANGLER_SEND_METRICS=false \
OPEN_ROUTER_API_KEY=${OPEN_ROUTER_API_KEY} \
GOOGLE_GENERATIVE_AI_API_KEY=${GOOGLE_GENERATIVE_AI_API_KEY} \
OLLAMA_API_BASE_URL=${OLLAMA_API_BASE_URL} \
VITE_LOG_LEVEL=${VITE_LOG_LEVEL}
VITE_LOG_LEVEL=${VITE_LOG_LEVEL} \
DEFAULT_NUM_CTX=${DEFAULT_NUM_CTX}

# Pre-configure wrangler to disable metrics
RUN mkdir -p /root/.config/.wrangler && \
Expand All @@ -57,6 +59,7 @@ ARG OPEN_ROUTER_API_KEY
ARG GOOGLE_GENERATIVE_AI_API_KEY
ARG OLLAMA_API_BASE_URL
ARG VITE_LOG_LEVEL=debug
ARG DEFAULT_NUM_CTX

ENV GROQ_API_KEY=${GROQ_API_KEY} \
HuggingFace_API_KEY=${HuggingFace_API_KEY} \
Expand All @@ -65,7 +68,8 @@ ENV GROQ_API_KEY=${GROQ_API_KEY} \
OPEN_ROUTER_API_KEY=${OPEN_ROUTER_API_KEY} \
GOOGLE_GENERATIVE_AI_API_KEY=${GOOGLE_GENERATIVE_AI_API_KEY} \
OLLAMA_API_BASE_URL=${OLLAMA_API_BASE_URL} \
VITE_LOG_LEVEL=${VITE_LOG_LEVEL}
VITE_LOG_LEVEL=${VITE_LOG_LEVEL} \
DEFAULT_NUM_CTX=${DEFAULT_NUM_CTX}

RUN mkdir -p ${WORKDIR}/run
CMD pnpm run dev --host
6 changes: 5 additions & 1 deletion app/lib/.server/llm/model.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ import { createOpenRouter } from "@openrouter/ai-sdk-provider";
import { createMistral } from '@ai-sdk/mistral';
import { createCohere } from '@ai-sdk/cohere'

export const DEFAULT_NUM_CTX = process.env.DEFAULT_NUM_CTX ?
parseInt(process.env.DEFAULT_NUM_CTX, 10) :
32768;

export function getAnthropicModel(apiKey: string, model: string) {
const anthropic = createAnthropic({
apiKey,
Expand Down Expand Up @@ -77,7 +81,7 @@ export function getHuggingFaceModel(apiKey: string, model: string) {

export function getOllamaModel(baseURL: string, model: string) {
let Ollama = ollama(model, {
numCtx: 32768,
numCtx: DEFAULT_NUM_CTX,
});

Ollama.config.baseURL = `${baseURL}/api`;
Expand Down
2 changes: 2 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ services:
- GOOGLE_GENERATIVE_AI_API_KEY=${GOOGLE_GENERATIVE_AI_API_KEY}
- OLLAMA_API_BASE_URL=${OLLAMA_API_BASE_URL}
- VITE_LOG_LEVEL=${VITE_LOG_LEVEL:-debug}
- DEFAULT_NUM_CTX=${DEFAULT_NUM_CTX:-32768}
- RUNNING_IN_DOCKER=true
extra_hosts:
- "host.docker.internal:host-gateway"
Expand Down Expand Up @@ -48,6 +49,7 @@ services:
- GOOGLE_GENERATIVE_AI_API_KEY=${GOOGLE_GENERATIVE_AI_API_KEY}
- OLLAMA_API_BASE_URL=${OLLAMA_API_BASE_URL}
- VITE_LOG_LEVEL=${VITE_LOG_LEVEL:-debug}
- DEFAULT_NUM_CTX=${DEFAULT_NUM_CTX:-32768}
- RUNNING_IN_DOCKER=true
extra_hosts:
- "host.docker.internal:host-gateway"
Expand Down