Support for Embedding Models #4117

SpaceCowboy850 · 2023-11-17T22:10:58Z

SpaceCowboy850
Nov 17, 2023

I want to mainly throw my support for wanting a solid embedding model in GGML. Even without typical LLM capabilities, I think there would be a TON of use for embedding documents of various types and doing similarity lookup, even if the return was simply the passage matched. So much better than string search.

I have found this fork, which appears to be abandoned but works
https://github.com/skeskinen/bert.cpp
Note for Windows users: I needed to make two modifications to get it to work:
Python sample_client.py - change
with open(os.path.join(os.path.dirname(__file__), txt_file), 'r') as f:
to this
with open(os.path.join(os.path.dirname(__file__), txt_file), 'r', encoding="utf-8") as f:

And in the windows server.cpp, change
ssize_t bytes_received = read(socket, buffer, sizeof(buffer));
to
ssize_t bytes_received = recv(socket, buffer, sizeof(buffer), 0);

And it seems to at least run the example.

I have also found this discussion from a month ago
#3667

Which said it seemed close.

Bert is better than nothing, but BGE is one of the top retreival embeddings on the huggingface embedding leaderboards:
https://huggingface.co/spaces/mteb/leaderboard

Hopefully this isn't too far off, as I'd love to just drop this into an app I'm building.

Thanks for all the effort on GGML - it is an amazing offering btw.

moatftw · 2023-11-18T00:57:02Z

moatftw
Nov 18, 2023

Same here, tying to find working model in gguf format.
By the way. I have tried using the embedding example from the llama.cpp project. I just load the dolphin-2.1-mistral-7b.Q5_K_M.gguf file for the -m option, since I couldn't find any embedding model in the gguf format yet. I've noticed that if I use the -ngl option to utilize the GPU, I get a different vector than when I don't use the option. For example, if I use -ngl 40, then embedding the string "abc" gives a different vector than when not using any GPU offloading.

Is there anything I did wrong?

0 replies

Johnhersh · 2023-12-10T20:17:18Z

Johnhersh
Dec 10, 2023

Did anyone figure out how to make this work? I tried using the server-mode /embedding endpoint to get some embeddings but all I get is an array of 0.0 values. Do I need to use a specific model for this? Or can any model work?

10 replies

Johnhersh Jan 22, 2024

Try just not passing in the model. You don't actually need to

candcconsulting Jan 22, 2024

Thanks again John,
You mean in the body ?
If I omit it, this is what I get

{"timestamp":1705928277,"level":"VERBOSE","function":"log_server_request","line":2742,"message":"request","request":"{\"input\" : \"This is the text to encode\"}","response":"File Not Found"}

Johnhersh Jan 22, 2024

It kinda sounds like that file is not where it should be? I run the server with --mlock which loads the entire model, so if the model is not available I'll find out when the server starts.

candcconsulting Jan 22, 2024

Thanks again John,
I modified stuff around so I could get to this command to launch the server

server --threads 14 --ctx-size 2048 --n-gpu-layers 33 --host 192.168.178.53 --port 1234 --verbose --embedding --mlock

However, I still get the same error

Available slots:
 -> Slot 0 - max context: 2048
all slots are idle and system prompt is empty, clear the KV cache
{"timestamp":1705929713,"level":"INFO","function":"log_server_request","line":2737,"message":"request","remote_addr":"192.168.xxx.xx","remote_port":63966,"status":404,"method":"POST","path":"/v1/embeddings","params":{}}
{"timestamp":1705929713,"level":"VERBOSE","function":"log_server_request","line":2742,"message":"request","request":"{\"input\" : \"This is the text to encode\"}","response":"File Not Found"}

I do have some vocab files in my models folder ... do they need moving to the same location as the model ?

candcconsulting Feb 12, 2024

@Johnhersh thanks and sorry for the delay.
I looked through the code and saw that the version I had only had /embedding not I was testing /v1/embeddings and /embeddings
when I noticed, then /embedding worked as expected.
I am pulling the latest master and rebuilding to ensure the openai endpoint works ... I expect it as it looks like the same code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Embedding Models #4117

{{title}}

Replies: 2 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Support for Embedding Models #4117

Replies: 2 comments · 10 replies

Replies: 2 comments 10 replies