You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This has worked on ubuntu with cuda installed, .gguf models within 1:1 size of gpu-ram are blazing fast.
This guide does NOT work with the jan-cortex api (sadly). I message saying:
"{"message":"Model has not been loaded, please load model into cortex.llamacpp"}" is returned,
and I can find zero documentation or clues about how to load a model through the web-api.
Steps:
1. In bash terminal, before running this, start the cortex server:
Note: for a ~custom model you may need to make your own yaml file to match the cortex system,
hopefully very straightforward.
Load//Start/Run model:
cortex run Mistral-Nemo-Instruct-2407-GGUF
Exit prompt but keep cortex server running:
(This step may be optional.)
exit()
Check that model is running & inspect:
cortex ps
This will show you the 'name' notation that you need to use to call the model.
Advocacy for Better Documentation
Note: Documentation for
A. the (totally amazing) cortex cli, and
B. the hopefully someday working Jan-cortex api,
is not sufficient to get the code working.
The notation system for actually connecting to a model should be this:
Convert the last three parts of the full file path in an undocumented colon-demarcated notation.
The cortex "name" of your model is the last three parts of the model path
separated by a colon.
e.g.
The path to your model will be something like this:
Use the last three path segments with a colon ":" between them:
So, "/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf" becomes ->
"model": "bartowski:Mistral-Nemo-Instruct-2407-GGUF:Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",
using a path will NOT work "model": "/home/YOURCOMPUTERNAME/cortexcpp/models/huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",
A. Documentation here: https://cortex.so/docs/quickstart/
says to use: "model": "llama3.1:8b-gguf",
as the model name which is not a system that has worked for me.
B. The Built-in documentation: localhost:39281/
gives you these specs to follow:
curl http://127.0.0.1:39281/v1/chat/completions \
--request POST \
--header 'Content-Type: application/json'
This crashes cortex every time, and says nothing about model notation.
C. The jan cortex api fastify documentation: http://localhost:1337/static/index.html
says to use this:
{
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Hello!",
"role": "user"
}
],
"model": "tinyllama-1.1b",
"stream": true,
"max_tokens": 2048,
"stop": [
"hello"
],
"frequency_penalty": 0,
"presence_penalty": 0,
"temperature": 0.7,
"top_p": 0.95
}
the "stop" command seems to break the api by design,
the notation for model-id uses no colon,
The "stop" field is routinely configured to make output impossible
def call_cortex_api(
cortex_prompt,
max_output_tokens=600,
cortex_model="bartowski:Mistral-Nemo-Instruct-2407-GGUF:Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",
# jan_api_cortext=False,
):
"""
Before using, start local Cortex 'server':
run bash cortex ps
to see the name/notation for models 'names'
Requires:
import requests
import json
"""
# if jan_api_cortext is True:
# local_cortex_domain = "http://localhost:1337/"
# else:
# local_cortex_domain = "http://localhost:39281/"
local_cortex_domain = "http://localhost:39281/"
try:
url = f"{local_cortex_domain}v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
data = {
# use last three path segments with : between
"model": cortex_model,
# using path will NOT work "model": "/home/YOURCOMPUTERNAME/cortexcpp/models/huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",
"messages": [
{
"role": "user",
"content": cortex_prompt
},
],
"stream": False,
"max_tokens": max_output_tokens,
"stop": [],
"frequency_penalty": 1,
"presence_penalty": 1,
"temperature": .5,
"top_p": 1
}
response = requests.post(url, headers=headers, data=json.dumps(data))
response_dict = response.json()
cortext_output_string = response_dict["choices"][0]["message"]["content"]
return (cortext_output_string, True)
except Exception as e:
return (f"Failed, Error: {str(e)}", False)
Cortex version
v1.0.5
Describe the issue and expected behaviour
(note: not sure about the engine, maybe just llama.cpp but GPU I think is being used)
"""
cortex.cpp cheatsheet https://github.com/janhq/cortex.cpp
For using cortex.cpp within python code:
"{"message":"Model has not been loaded, please load model into cortex.llamacpp"}" is returned,
and I can find zero documentation or clues about how to load a model through the web-api.
Steps:
1. In bash terminal, before running this, start the cortex server:
If not done, install
curl -s https://raw.githubusercontent.com/janhq/cortex/main/engine/templates/linux/install.sh | sudo bash -s -- --deb_local
Note: for a ~custom model you may need to make your own yaml file to match the cortex system,
hopefully very straightforward.
(This step may be optional.)
exit()
This will show you the 'name' notation that you need to use to call the model.
Advocacy for Better Documentation
Note: Documentation for
A. the (totally amazing) cortex cli, and
B. the hopefully someday working Jan-cortex api,
is not sufficient to get the code working.
Convert the last three parts of the full file path in an undocumented colon-demarcated notation.
The cortex "name" of your model is the last three parts of the model path
separated by a colon.
e.g.
The path to your model will be something like this:
Use the last three path segments with a colon ":" between them:
using a path will NOT work "model": "/home/YOURCOMPUTERNAME/cortexcpp/models/huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",
A. Documentation here:
https://cortex.so/docs/quickstart/
says to use: "model": "llama3.1:8b-gguf",
as the model name which is not a system that has worked for me.
B. The Built-in documentation: localhost:39281/
gives you these specs to follow:
This crashes cortex every time, and says nothing about model notation.
C. The jan cortex api fastify documentation: http://localhost:1337/static/index.html
says to use this:
{
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Hello!",
"role": "user"
}
],
"model": "tinyllama-1.1b",
"stream": true,
"max_tokens": 2048,
"stop": [
"hello"
],
"frequency_penalty": 0,
"presence_penalty": 0,
"temperature": 0.7,
"top_p": 0.95
}
the "stop" command seems to break the api by design,
the notation for model-id uses no colon,
Can use:
http://127.0.0.1:39281/v1/models
Example of "id" without quotes:
http://127.0.0.1:39281/v1/models/bartowski:Mistral-Nemo-Instruct-2407-GGUF:Mistral-Nemo-Instruct-2407-Q6_K_L.gguf
"""
import requests
import json
def call_cortex_api(
cortex_prompt,
max_output_tokens=600,
cortex_model="bartowski:Mistral-Nemo-Instruct-2407-GGUF:Mistral-Nemo-Instruct-2407-Q6_K_L.gguf",
# jan_api_cortext=False,
):
"""
Before using, start local Cortex 'server':
run
bash cortex ps
to see the name/notation for models 'names'
#######
Test
#######
# Timer: For Start of Script
start_the_timer = start_timer()
Cortext Call Function
cortext_output_string = call_cortex_api("Are birds dinosaurs?")
print(cortext_output_string)
# Timer: For End of Script
end_timer(start_the_timer)
Steps to Reproduce
No response
Screenshots / Logs
No response
What is your OS?
What engine are you running?
Hardware Specs eg OS version, GPU
System Details Report --- ## Report details - Date generated: 2024-12-23 18:25:17 ## Hardware Information: - Hardware Model: Dell Inc. Precision 7780 - Memory: 128.0 GiB - Processor: 13th Gen Intel® Core™ i7-13850HX × 28 - Graphics: Intel® Graphics (RPL-S) - Graphics 1: NVIDIA RTX 3500 Ada Generation Laptop GPU - Disk Capacity: 1.0 TB ## Software Information: - Firmware Version: 1.17.0 - OS Name: Ubuntu 24.04.1 LTS - OS Build: (null) - OS Type: 64-bit - GNOME Version: 46 - Windowing System: Wayland - Kernel Version: Linux 6.8.0-51-generic
The text was updated successfully, but these errors were encountered: