You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The docs mention it but never give an example on how to run it using local inference. The only mention is the openai compatible api but that it doesnt support all of the functions.
The text was updated successfully, but these errors were encountered:
There are two ways you can use it with local llm, firstly you can use a external inference server like ollama, llama.cpp, vllm etc.
E.g. to use ollama just run the model as usual using ollama run llama3.1:8b-instruct-fp16 then you can run optillm with the base_url of the inference server. optillm base_url http://localhost:11434/v1 Now you can use optillm in your own code using OpenAI client by changing the base_url to http://locahost:8000/v1 Here is a reddit post detailing how a user was able to run it with ollama and openweb gui - https://www.reddit.com/r/ollama/comments/1gpsigx/optillm_with_ollama/
The other option is to use the built-in local inference server. To use the built-in server you need to set the OPTILLM_API_KEY=optillm before running the proxy, then in the OpenAI client you can use the same key client = OpenAI(api_key="optillm", base_url="http://locahost:8000/v1"
Now, you can load any HF model by passing in the name of the model in the OpenAI client as usual. You can also load any LoRA adapters by using the +.
Does that answer you question? Let me know if you need more help or run into issues while trying it.
The docs mention it but never give an example on how to run it using local inference. The only mention is the openai compatible api but that it doesnt support all of the functions.
The text was updated successfully, but these errors were encountered: