Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to try to use this project for local llm, but I'm not sure how. #93

Open
matbee-eth opened this issue Nov 13, 2024 · 2 comments
Open
Labels
question Further information is requested

Comments

@matbee-eth
Copy link

The docs mention it but never give an example on how to run it using local inference. The only mention is the openai compatible api but that it doesnt support all of the functions.

@codelion codelion added the question Further information is requested label Nov 13, 2024
@codelion
Copy link
Owner

codelion commented Nov 13, 2024

There are two ways you can use it with local llm, firstly you can use a external inference server like ollama, llama.cpp, vllm etc.

E.g. to use ollama just run the model as usual using ollama run llama3.1:8b-instruct-fp16 then you can run optillm with the base_url of the inference server. optillm base_url http://localhost:11434/v1 Now you can use optillm in your own code using OpenAI client by changing the base_url to http://locahost:8000/v1 Here is a reddit post detailing how a user was able to run it with ollama and openweb gui - https://www.reddit.com/r/ollama/comments/1gpsigx/optillm_with_ollama/

The other option is to use the built-in local inference server. To use the built-in server you need to set the OPTILLM_API_KEY=optillm before running the proxy, then in the OpenAI client you can use the same key client = OpenAI(api_key="optillm", base_url="http://locahost:8000/v1"
Now, you can load any HF model by passing in the name of the model in the OpenAI client as usual. You can also load any LoRA adapters by using the +.

Does that answer you question? Let me know if you need more help or run into issues while trying it.

@matbee-eth
Copy link
Author

Ah, I have to run the "optillm.py" first, then I can proxy the requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants