Issue with Concurrent Query Processing and Document Upload #1848

llmwesee · 2024-09-18T05:50:05Z

I have implemented a solution using vLLM on an A100 server to support multiple users. However, I have encountered an issue:

While one user's query is being processed, other users are unable to upload documents into the UserData or MyData collections. The document upload process gets stuck at the processing stage without any errors appearing in the terminal or UI. Additionally, the document is not uploaded successfully.

Can you suggest ways to decouple the query processing, document upload, and user interface programs so they can run independently of each other?

Alternatively, can we build or use prebuilt separate APIs to manage program in the backend?
Please provide suggestions or potential solutions.

The text was updated successfully, but these errors were encountered:

pseudotensor · 2024-09-20T02:53:18Z

They should all be independent unless you changed CONCURRENCY_COUNT to be 1. This is tested normally. The backend has no issues with this at all.

pseudotensor · 2024-09-20T02:54:18Z

Once you have that working, I can explain how to make it even more efficient using the function_server.

llmwesee · 2024-09-20T03:50:05Z

this is the command for running h2ogpt with login.
python generate.py --base_model=meta-llama/Meta-Llama-3.1-8B-Instruct --score_model=None --langchain_mode='UserData' --user_path=user_path --auth='' --use_auth_token=True --visible_visible_models=False --max_seq_len=8192 --max_max_new_tokens=4096 --max_new_tokens=4096 --min_new_tokens=256
can you show me some examples for having h2ogpt as fully backend server running with full functionality from query processing to document uploading for multiple users concurrently & independently . I want to integrated it's backend with react or next.js as frontend with having full functionality like as h2ogpt and having a datalake for all related document things

pseudotensor · 2024-09-30T06:28:17Z

I'd guess I'd need to ask how you see things blocked. E.g. if you had a pytest test code that you are running that shows how things are blocking each other (e.g. long add of dock and then chat is blocked in another test you ran with -n 2) or you just show video of the UI and what you are doing, I can mimic it and see if I can see what you are seeing.

pseudotensor · 2024-10-03T19:31:55Z

As for the function server, you can try it. Just add to CLI:

 --function_server=True --function_server_workers=5 --multiple_workers_gunicorn=True --function_server_port=5002 --function_api_key=API_KEY

llmwesee · 2024-10-04T03:56:58Z

the function server has issue when hitting through upload_api and add_file_api

Traceback (most recent call last):
  File "/home/abc/Documents/xxxx/xxxx/src/gpt_langchain.py", line 9383, in update_user_db
    return _update_user_db(file, db1s=db1s,
  File "/home/xxxx/src/gpt_langchain.py", line 9664, in _update_user_db
    sources = call_function_server('0.0.0.0', function_server_port, 'path_to_docs', (file,), simple_kwargs,
  File "/home/xxxx/src/function_client.py", line 50, in call_function_server
    execute_result = execute_function_on_server(host, port, function_name, args, kwargs, use_disk, use_pickle,
  File "/home/xxxx/src/function_client.py", line 21, in execute_function_on_server
    response = requests.post(url, json=payload, headers=headers)
  File "/home/xxxx/lib/python3.10/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/home/xxxx/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/xxxx/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/xxxx/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/xxxx/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=5002): Max retries exceeded with url: /execute_function/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1deb5867a0>: Failed to establish a new connection: [Errno 111] Connection refused'))

pseudotensor · 2024-10-04T15:46:02Z

It just looks like the function server isn't even up. Perhaps you have something else on that port etc. Check startup logs.

llmwesee · 2024-10-07T06:44:56Z

They should all be independent unless you changed CONCURRENCY_COUNT to be 1. This is tested normally. The backend has no issues with this at all.

when setting concurrency count to be 64:

python generate.py --base_model=meta-llama/Meta-Llama-3.1-8B-Instruct --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --visible_visible_models=False --max_seq_len=8192 --max_max_new_tokens=4096 --max_new_tokens=4096 --min_new_tokens=256 --api_open=True --allow_api=True --max_quality=True --function_server=True --function_server_workers=5 --multiple_workers_gunicorn=True --function_server_port=5002 --function_api_key=API_KEY --concurrency_count=64

then the following error is shown:

File "/home/xxxx/src/gen.py", line 1736, in main
    raise ValueError(
ValueError: Concurrency count > 1 will lead to mixup in cache use for local LLMs, disable this raise at own risk.

pseudotensor · 2024-10-07T15:09:12Z

Correct, I recommend vLLM for handling concurrency well, transformers is not itself thread safe.

h2oai deleted a comment Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Concurrent Query Processing and Document Upload #1848

Issue with Concurrent Query Processing and Document Upload #1848

llmwesee commented Sep 18, 2024

pseudotensor commented Sep 20, 2024 •

edited

Loading

pseudotensor commented Sep 20, 2024

llmwesee commented Sep 20, 2024

pseudotensor commented Sep 30, 2024

pseudotensor commented Oct 3, 2024

llmwesee commented Oct 4, 2024

pseudotensor commented Oct 4, 2024

llmwesee commented Oct 7, 2024

pseudotensor commented Oct 7, 2024

Issue with Concurrent Query Processing and Document Upload #1848

Issue with Concurrent Query Processing and Document Upload #1848

Comments

llmwesee commented Sep 18, 2024

pseudotensor commented Sep 20, 2024 • edited Loading

pseudotensor commented Sep 20, 2024

llmwesee commented Sep 20, 2024

pseudotensor commented Sep 30, 2024

pseudotensor commented Oct 3, 2024

llmwesee commented Oct 4, 2024

pseudotensor commented Oct 4, 2024

llmwesee commented Oct 7, 2024

pseudotensor commented Oct 7, 2024

pseudotensor commented Sep 20, 2024 •

edited

Loading