bug: Please clarify documentation for model name and stop and jan-api #1823

lineality · 2024-12-24T01:26:10Z

Cortex version

v1.0.5

Describe the issue and expected behaviour

(note: not sure about the engine, maybe just llama.cpp but GPU I think is being used)

"""

cortex.cpp cheatsheet https://github.com/janhq/cortex.cpp

For using cortex.cpp within python code:

This has worked on ubuntu with cuda installed, .gguf models within 1:1 size of gpu-ram are blazing fast.
This guide does NOT work with the jan-cortex api (sadly). I message saying:
"{"message":"Model has not been loaded, please load model into cortex.llamacpp"}" is returned,
and I can find zero documentation or clues about how to load a model through the web-api.

Steps:

1. In bash terminal, before running this, start the cortex server:

If not done, install

curl -s https://raw.githubusercontent.com/janhq/cortex/main/engine/templates/linux/install.sh | sudo bash -s -- --deb_local

To be safe, fresh install huggingface model (note if license is appropriate for your use, and authors trusted)

cortex pull bartowski/Mistral-Nemo-Instruct-2407-GGUF

Note: for a ~custom model you may need to make your own yaml file to match the cortex system,
hopefully very straightforward.

Load//Start/Run model:

cortex run Mistral-Nemo-Instruct-2407-GGUF

Exit prompt but keep cortex server running:
(This step may be optional.)

exit()

Check that model is running & inspect:

cortex ps

This will show you the 'name' notation that you need to use to call the model.

Advocacy for Better Documentation

Note: Documentation for
A. the (totally amazing) cortex cli, and
B. the hopefully someday working Jan-cortex api,
is not sufficient to get the code working.

The notation system for actually connecting to a model should be this:

Convert the last three parts of the full file path in an undocumented colon-demarcated notation.
The cortex "name" of your model is the last three parts of the model path
separated by a colon.

e.g.

The path to your model will be something like this:

/home/YOURCOMPUTERNAME/cortexcpp/models/huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf

Use the last three path segments with a colon ":" between them:

    So, "/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf" becomes ->
    "model": "bartowski:Mistral-Nemo-Instruct-2407-GGUF:Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",

using a path will NOT work "model": "/home/YOURCOMPUTERNAME/cortexcpp/models/huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",

A. Documentation here:
https://cortex.so/docs/quickstart/
says to use: "model": "llama3.1:8b-gguf",
as the model name which is not a system that has worked for me.

B. The Built-in documentation: localhost:39281/
gives you these specs to follow:

curl http://127.0.0.1:39281/v1/chat/completions \
  --request POST \
  --header 'Content-Type: application/json'

This crashes cortex every time, and says nothing about model notation.

C. The jan cortex api fastify documentation: http://localhost:1337/static/index.html
says to use this:
{
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Hello!",
"role": "user"
}
],
"model": "tinyllama-1.1b",
"stream": true,
"max_tokens": 2048,
"stop": [
"hello"
],
"frequency_penalty": 0,
"presence_penalty": 0,
"temperature": 0.7,
"top_p": 0.95
}
the "stop" command seems to break the api by design,
the notation for model-id uses no colon,

The "stop" field is routinely configured to make output impossible

Do not use Null
leave stop out entirely, or use []

Can use:

http://127.0.0.1:39281/v1/models

Example of "id" without quotes:
http://127.0.0.1:39281/v1/models/bartowski:Mistral-Nemo-Instruct-2407-GGUF:Mistral-Nemo-Instruct-2407-Q6_K_L.gguf

"""

import requests
import json

def call_cortex_api(
cortex_prompt,
max_output_tokens=600,
cortex_model="bartowski:Mistral-Nemo-Instruct-2407-GGUF:Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",
# jan_api_cortext=False,
):
"""
Before using, start local Cortex 'server':
run
bash cortex ps
to see the name/notation for models 'names'

Requires:
    import requests
    import json
"""

# if jan_api_cortext is True:
#     local_cortex_domain = "http://localhost:1337/"

# else:
#     local_cortex_domain = "http://localhost:39281/"

local_cortex_domain = "http://localhost:39281/"

try: 
    url = f"{local_cortex_domain}v1/chat/completions"
    
    headers = {
        "Content-Type": "application/json"
    }
    
    data = {
        # use last three path segments with : between
        "model": cortex_model,
        # using path will NOT work "model": "/home/YOURCOMPUTERNAME/cortexcpp/models/huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",
        "messages": [
        {
          "role": "user",
          "content": cortex_prompt
        },
        ],
        "stream": False,
        "max_tokens": max_output_tokens,
        "stop": [],
        "frequency_penalty": 1,
        "presence_penalty": 1,
        "temperature": .5,
        "top_p": 1
    }
    
    response = requests.post(url, headers=headers, data=json.dumps(data))
    
    response_dict = response.json()
    
    cortext_output_string = response_dict["choices"][0]["message"]["content"]
    
    return (cortext_output_string, True)

except Exception as e:
    return (f"Failed, Error: {str(e)}", False)

#######

Test

#######

# Timer: For Start of Script

start_the_timer = start_timer()

Cortext Call Function

cortext_output_string = call_cortex_api("Are birds dinosaurs?")

print(cortext_output_string)

# Timer: For End of Script

end_timer(start_the_timer)

Steps to Reproduce

No response

Screenshots / Logs

No response

What is your OS?

Windows
Mac Silicon
Mac Intel
Linux / Ubuntu

What engine are you running?

cortex.llamacpp (default)
cortex.tensorrt-llm (Nvidia GPUs)
cortex.onnx (NPUs, DirectML)

Hardware Specs eg OS version, GPU

System Details Report --- ## Report details - Date generated: 2024-12-23 18:25:17 ## Hardware Information: - Hardware Model: Dell Inc. Precision 7780 - Memory: 128.0 GiB - Processor: 13th Gen Intel® Core™ i7-13850HX × 28 - Graphics: Intel® Graphics (RPL-S) - Graphics 1: NVIDIA RTX 3500 Ada Generation Laptop GPU - Disk Capacity: 1.0 TB ## Software Information: - Firmware Version: 1.17.0 - OS Name: Ubuntu 24.04.1 LTS - OS Build: (null) - OS Type: 64-bit - GNOME Version: 46 - Windowing System: Wayland - Kernel Version: Linux 6.8.0-51-generic

The text was updated successfully, but these errors were encountered:

lineality added the type: bug Something isn't working label Dec 24, 2024

github-project-automation bot added this to Jan & Cortex Dec 24, 2024

github-project-automation bot moved this to Investigating in Jan & Cortex Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Please clarify documentation for model name and stop and jan-api #1823

bug: Please clarify documentation for model name and stop and jan-api #1823

lineality commented Dec 24, 2024 •

edited

Loading

bug: Please clarify documentation for model name and stop and jan-api #1823

bug: Please clarify documentation for model name and stop and jan-api #1823

Comments

lineality commented Dec 24, 2024 • edited Loading

Cortex version

Describe the issue and expected behaviour

cortex.cpp cheatsheet https://github.com/janhq/cortex.cpp

For using cortex.cpp within python code:

Steps:

1. In bash terminal, before running this, start the cortex server:

Advocacy for Better Documentation

e.g.

Use the last three path segments with a colon ":" between them:

using a path will NOT work "model": "/home/YOURCOMPUTERNAME/cortexcpp/models/huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",

Test

# Timer: For Start of Script

Cortext Call Function

# Timer: For End of Script

Steps to Reproduce

Screenshots / Logs

What is your OS?

What engine are you running?

Hardware Specs eg OS version, GPU

lineality commented Dec 24, 2024 •

edited

Loading