-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Undeterministic Batch Formation #208
Comments
@elvin-n will take a look. |
I think this is inherent to async request add / batch creation in staging engine. Not sure if we can make it deterministic @yelite |
But that test adds requests one by one in a for loop, and the engine push requests to queues before they reach to the worker batch. I don't see anything obvious in that code path which would create undeterministic behavior. |
By "async" I meant the request arrives to the worker via |
Yes that's async and very likely to be the cause of undeterministic batch here. I don't have a good way to fix this without impacting the performance. In #193 I plan to remove the sync engine, but add some flags to the staging engine to run things in a more synchronous way, so we can have more deterministic behavior in unit tests if it's necessary. |
There are three use cases:
2 and 3 async points were added by design, helps in real situation and cannot be removed. The only predictable flow can be with Sync engine and explicit call of add/step. |
* fix for vicuna * fix
The default run for
serve/tests/test_engine.py
first adds 4 requests and then start the engine.I expected this would form single prefill batch with 4 requests.
However, it shows non-deterministic behavior, sometimes it forms two prefill batches of 2 requests each, sometimes it forms two prefill batches of 1/3 requests each.
Debug log does not provide any useful information.
The text was updated successfully, but these errors were encountered: