Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/unload model api #97

Closed
wants to merge 5 commits into from
Closed

Conversation

thunhuanh
Copy link

@thunhuanh thunhuanh commented Oct 31, 2023

Add API to unload model.
Resolve #86

@tikikun tikikun requested review from hiro-v and tikikun October 31, 2023 09:44
Copy link
Contributor

@tikikun tikikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct approach , please do some manual test and merge @vuonghoainam tks

@hiro-v
Copy link
Contributor

hiro-v commented Nov 2, 2023

Hi @thunhu99
Thank you for your contribution. It's great.

However, the server is not working as expected as the server stops working after I sent the DELETE request to unload model

  1. Load the model
curl --location 'http://localhost:3928/inferences/llamacpp/loadModel' \
--header 'Content-Type: application/json' \
--data '{
    "llama_model_path": "<model_path>",
    "ctx_len": 2048,
    "ngl": 100,
    "embedding": true
}'
  1. Test the model to make sure it's working correctly:
curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--header 'Access-Control-Allow-Origin: *' \
--data '{
        "messages": [
            {"content": "[INST] Write code to solve the following coding problem that obeys the constraints and passes the example test cases. Please wrap your code answer using ```:{prompt}[/INST]", "role": "system"},
            {"content": "python code for fibonacci", "role": "user"},
            {"content": "Here is a Python code for Fibonacci sequence:\n```def fib(n):if n <= 1:return else:return fib(n-1) + fib(n-2)```This code takes an integer `n` as input and returns the `n`-th Fibonacci number.", "role": "assistant"},
            {"content": "please continue", "role": "user"}
        ],
        "stream": true,
        "model": "gpt-3.5-turbo",
        "max_tokens": 2048,
        "stop": ["hello"],
        "frequency_penalty": 0,
        "presence_penalty": 0,
        "temperature": 0
     }'
  1. Try to unload the model (as your code change)
curl --location --request DELETE 'http://localhost:3928/inferences/llamacpp/unloadmodel' \
--header 'Content-Type: application/json' \
--data ''

However, after the 3rd step, the server stops working and I cannot send the 1st step again (which is similar to killing process)
What we expect is that after 3rd step, I can loadmodel again.

Could you please check and make some changes, thanks

{
Json::Value jsonResp;
if (model_loaded) {
llama.unloadModel();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I tested, the server stops working after this line.
The below lines do not execute to return result

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry my mistake. The API endpoint map to the wrong handler:
it should be METHOD_ADD(llamaCPP::unloadModel, "unloadmodel", Delete);
instead of METHOD_ADD(llamaCPP::loadModel, "unloadmodel", Delete);

@hiro-v hiro-v added P1: important Important feature / fix type: enhancement labels Nov 2, 2023
@hiro-v hiro-v added this to the Nitro v2.0.0 milestone Nov 2, 2023
@thunhuanh
Copy link
Author

thunhuanh commented Nov 2, 2023

I have fixed the issue, and test it locally, the changes should work now @vuonghoainam @tikikun

@tikikun
Copy link
Contributor

tikikun commented Nov 7, 2023

Hi @thunhuanh there has been quite intense i need to refactor this PR into another PR a bit to merge, will credit back to this issue.

@tikikun
Copy link
Contributor

tikikun commented Nov 13, 2023

Hi @thunhuanh I have added your change #122 to this PR with a little bit of change, thank you very much to take the time to contribute to the project

@tikikun tikikun closed this Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: important Important feature / fix
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

feat: Add API to unload model
4 participants