llama.node

Node binding of llama.cpp.

llama.cpp: Inference of LLaMA model in pure C/C++

Installation

npm install @fugood/llama.node

Usage

import { loadModel } from '@fugood/llama.node'

// Initial a Llama context with the model (may take a while)
const context = await loadModel({
  model: 'path/to/gguf/model',
  use_mlock: true,
  n_ctx: 2048,
  n_gpu_layers: 1, // > 0: enable GPU
  // embedding: true, // use embedding
  // lib_variant: 'opencl', // Change backend
})

// Do completion
const { text } = await context.completion(
  {
    prompt: 'This is a conversation between user and llama, a friendly chatbot. respond in simple markdown.\n\nUser: Hello!\nLlama:',
    n_predict: 100,
    stop: ['</s>', 'Llama:', 'User:'],
    // n_threads: 4,
  },
  (data) => {
    // This is a partial completion callback
    const { token } = data
  },
)
console.log('Result:', text)

Lib Variants

default: General usage, not support GPU except macOS (Metal)
vulkan: Support GPU Vulkan (Windows/Linux), but some scenario might unstable

License

MIT

Built and maintained by BRICKS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

llama.node

Installation

Usage

Lib Variants

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

llama.node

Installation

Usage

Lib Variants

License