Interactive #47

luca-saggese · 2023-05-06T12:51:54Z

I'm new to llm and llama but learning fast, I've wrote a small piece of code to chat via cli, but it seems to not follow the context (ie work in interactive mode).

import { LLM } from "llama-node";
import readline from "readline";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
const saveSession = path.resolve(process.cwd(), "./tmp/session.bin");
const loadSession = path.resolve(process.cwd(), "./tmp/session.bin");

import path from "path";
const model = path.resolve(process.cwd(), "./ggml-vic7b-q4_1.bin"); 

const llama = new LLM(LLamaRS);
llama.load({ path: model });


var rl = readline.createInterface(process.stdin, process.stdout);
console.log("Chatbot started!");
rl.setPrompt("> ");
rl.prompt();
rl.on("line", async function (line) {
    const prompt = `A chat between a user and an assistant.
    USER: ${line}
    ASSISTANT:`;
    llama.createCompletion({
        prompt,
        numPredict: 128,
        temp: 0.2,
        topP: 1,
        topK: 40,
        repeatPenalty: 1,
        repeatLastN: 64,
        seed: 0,
        feedPrompt: true,
        saveSession,
        loadSession,
    }, (response) => {
        if(response.completed) {
            process.stdout.write('\n'); 
            rl.prompt(); 
        } else {
            process.stdout.write(response.token);
        }  
    });
});

I'm missing something?

The text was updated successfully, but these errors were encountered:

hlhr202 · 2023-05-06T15:08:27Z

@luca-saggese you need to maintain the context on the nodejs side. ie. you should maintain a list of chatting histories where every items of the list should not exceed the context length of your model. thats why llama-node also expose the tokenizer to node.js.

luca-saggese · 2023-05-06T19:55:05Z

@hlhr202 thanks for the comment, where should I pass the context to the new query? within the prompt?

hlhr202 · 2023-05-07T04:14:30Z

@hlhr202 thanks for the comment, where should I pass the context to the new query? within the prompt?

yes, your prompt should be a string that compose chatting list. at the same time you also have to make sure it doesnt exceed the context length limit of the model

luca-saggese · 2023-05-07T04:40:48Z

understood, and what is the point of saveSession and loadSession?

hlhr202 · 2023-05-07T04:56:31Z

understood, and what is the point of saveSession and loadSession?

#24

They are used for accelerating loading.

suspicious-pineapple · 2023-05-10T10:42:38Z

@luca-saggese
i had great success using saveSession/loadSession for chatbots. (thanks for implementing it hlhr202 <3 it made everything so much easier)

Keeping a list of previous messages in every prompt (as he suggested) works, but is slow.

Instead, during startup, i call createCompletion (initial prompt) with feedPromptOnly and saveSession once. (can also copy the initial cache file to make future startup faster)

Every new message is added individually with feedPromptOnly, saveSession+loadSession

to get a bot response, just call without feedPromptOnly as usual

This is still limited by context length, with the added disadvantage that you can't clear old messages (takes a while to run into the 2048 token ctx limit tho)

also seems to improve "conversation memory" without extra cost of including more messages in the chat history

suspicious-pineapple · 2023-05-10T10:46:06Z

regarding the context length limit; rustformers/llm#77 might be related

luca-saggese · 2023-05-10T11:25:47Z

@end-me-please thanks fo the help, here is a working version for anyone interested:

import { LLM } from "llama-node";
import readline from "readline";
import fs from "fs";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
import path from "path";

const sessionFile = path.resolve(process.cwd(), "./tmp/session.bin");
const saveSession = sessionFile;
const loadSession = sessionFile;
// remove old session
if(fs.existsSync(sessionFile)) fs.unlinkSync(sessionFile);


const model = path.resolve(process.cwd(), "./ggml-vic7b-q4_1.bin"); // ggml-vicuna-7b-1.1-q4_1.bin");

const llama = new LLM(LLamaRS);
llama.load({ path: model });

var rl = readline.createInterface(process.stdin, process.stdout);
console.log("Chatbot started!");
rl.setPrompt("> ");
rl.prompt();
let cnt = 0;
rl.on("line", async function (line) {
    // Here Passing our input text to the manager to get response and display response answer.
    const prompt = `USER: ${line}
                    ASSISTANT:`;
    llama.createCompletion({
        prompt: cnt ==0 ? 'A chat between a user and an assistant.\n\n' + prompt : prompt,
        numPredict: 1024,
        temp: 0.2,
        topP: 1,
        topK: 40,
        repeatPenalty: 1,
        repeatLastN: 64,
        seed: 0,
        feedPrompt: true, //: cnt == 0,
        saveSession,
        loadSession,
    }, (response) => {
        if(response.completed) {
            process.stdout.write('\n'); 
            rl.prompt(); 
            cnt ++;
        } else {
            process.stdout.write(response.token);
        }  
    });
});

ralyodio · 2023-05-15T01:11:07Z

can we make it so previous prompts are part of an array? Otherwise it would continuously show the entire history with every response.

linonetwo · 2023-07-19T06:52:49Z

There is an interactive GUI in https://talk.tiddlywiki.org/t/tidgi-is-the-first-bi-link-note-taking-app-with-free-local-ai-that-works-totally-offline-and-privately/7600

CodeJjang · 2023-09-09T12:19:06Z

@end-me-please @luca-saggese I can't make it work.
I am calling:

llama.load(config).then(() => {
    return llama.createCompletion({
      nThreads: 4,
      nTokPredict: 2048,
      topK: 40,
      topP: 0.1,
      temp: 0.8,
      repeatPenalty: 1,
      prompt: instructions,
      feedPrompt: true,
      feedPromptOnly: true,
      saveSession,
      loadSession
    }, (resp) => {console.log(resp)})
  }).then(() => console.log('Finished init llm'))

Two weird things:

No session file created
Why is the callback of "console.log(resp)" being called if feedPromptOnly is true (i.e. shouldn't do inference)?

And then:

    const resp = await llama.createCompletion({
      nThreads: 4,
      nTokPredict: 2048,
      topK: 40,
      topP: 0.1,
      temp: 0.8,
      repeatPenalty: 1,
      prompt,
      loadSession
    }, (cbResp) => {process.stdout.write(cbResp.token);})

The first prompt that I fed is completely ignored...

hlhr202 added the question Further information is requested label May 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactive #47

Interactive #47

luca-saggese commented May 6, 2023 •

edited

Loading

hlhr202 commented May 6, 2023

luca-saggese commented May 6, 2023

hlhr202 commented May 7, 2023

luca-saggese commented May 7, 2023

hlhr202 commented May 7, 2023 •

edited

Loading

suspicious-pineapple commented May 10, 2023

suspicious-pineapple commented May 10, 2023

luca-saggese commented May 10, 2023 •

edited

Loading

ralyodio commented May 15, 2023

linonetwo commented Jul 19, 2023

CodeJjang commented Sep 9, 2023

Interactive #47

Interactive #47

Comments

luca-saggese commented May 6, 2023 • edited Loading

hlhr202 commented May 6, 2023

luca-saggese commented May 6, 2023

hlhr202 commented May 7, 2023

luca-saggese commented May 7, 2023

hlhr202 commented May 7, 2023 • edited Loading

suspicious-pineapple commented May 10, 2023

suspicious-pineapple commented May 10, 2023

luca-saggese commented May 10, 2023 • edited Loading

ralyodio commented May 15, 2023

linonetwo commented Jul 19, 2023

CodeJjang commented Sep 9, 2023

luca-saggese commented May 6, 2023 •

edited

Loading

hlhr202 commented May 7, 2023 •

edited

Loading

luca-saggese commented May 10, 2023 •

edited

Loading