Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when executor_type="funcx" failing with ValueError: The tasks queue is empty, no tasks were submitted for training! #39

Open
vinaBira opened this issue Jul 7, 2023 · 4 comments

Comments

@vinaBira
Copy link

vinaBira commented Jul 7, 2023

Getting error on quickstart_pytorch.py tutorial with executor_type="funcx":
Traceback (most recent call last):
File "flox/examples/quickstart_pytorch/quickstart_pytorch.py", line 130, in
main()
File "flox/examples/quickstart_pytorch/quickstart_pytorch.py", line 126, in main
flox_controller.run_federated_learning()
File "/home/edg4/FLoX/flox/controllers/MainController.py", line 563, in run_federated_learning
tasks = self.on_model_broadcast()
File "/home/edg4/FLoX/flox/controllers/MainController.py", line 371, in on_model_broadcast
raise ValueError(
ValueError: The tasks queue is empty, no tasks were submitted for training!

@nikita-kotsehub
Copy link
Collaborator

@vinaBira can you check if your endpoints are active? Your log console should print out the status of each endpoint. If all of them are offline, then no tasks were submitted for training, and therefore the loop terminated with the ValueError.

@vinaBira
Copy link
Author

vinaBira commented Jul 8, 2023

@nikita-kotsehub I am using edge devices and yes they are active....Is there any particular state of client machines we are looking for?

@vinaBira
Copy link
Author

@nikita-kotsehub Please refer the screenshots attached and correct if configuration is wrong at any point.
Screen Shot 2023-07-18 at 12 58 18 PM
Screen Shot 2023-07-18 at 12 58 04 PM

@nikita-kotsehub
Copy link
Collaborator

nikita-kotsehub commented Jul 18, 2023

@vinaBira try to run simple funcX tasks on those endpoints before trying out flox. You can find tutorials for simple funcX tasks here: https://funcx.org/.

If you succeed in that, then there is some issue with the tasks not being submitted to the endpoints. I'd suggest you look through lines of code 311 - 373 in flox/controllers/MainController.py and try inserting logger or print statements to try to identify at which point the failure occurs.

Also, did you try running the examples under flox/examples? If not, reading instructions for setup might help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants