You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, pandarallel dutifully creates one progress bar for each and every worker but on multi-core systems with a large-ish number of cores (say 128 or more) seeing so many progress bars can be overwhelming. In these situations, it may prove more valuable to display a smaller number of progress bars (not necessarily one overall) with each worker mapped to one of the displayed progress bars.
What is proposed:
For N workers, offer the option to display M progress bars where N >= M and each worker contributes to one progress bar (i.e. worker n contributes to progress bar m such that m = (n % M)).
If an error occurs during execution, that worker's progress bar (which may represent progress from multiple workers) will indicate an error occurred, matching current functionality.
Keep the existing default behavior unchanged so that not specifying a maximum number of progress bars to display results in as many progress bars as workers.
Additional motivation:
We have successfully used pandarallel on systems with a much larger number of cores than 128 where seeing as many progress bars as workers is genuinely problematic. We very much benefit from and do not want to simply disable the progress bars -- we want to monitor the progress of our parallel_apply() and parallel_map() operations in a digestible way and without flooding the screen / notebook with too much information.
Proposed implementation:
A working implementation has been prepared along with unittests -- a pull request will be added to this issue.
The text was updated successfully, but these errors were encountered:
Example of the code from PR #243 in use in the IPython console:
Of the 20 workers, each gets 5M rows from a 100M row DataFrame. Because 20 is not divisble by 3, the first 2 progress bars each represent 7 workers and the last 1 progress bar represents 6 workers.
Currently,
pandarallel
dutifully creates one progress bar for each and every worker but on multi-core systems with a large-ish number of cores (say 128 or more) seeing so many progress bars can be overwhelming. In these situations, it may prove more valuable to display a smaller number of progress bars (not necessarily one overall) with each worker mapped to one of the displayed progress bars.What is proposed:
Additional motivation:
We have successfully used
pandarallel
on systems with a much larger number of cores than 128 where seeing as many progress bars as workers is genuinely problematic. We very much benefit from and do not want to simply disable the progress bars -- we want to monitor the progress of ourparallel_apply()
andparallel_map()
operations in a digestible way and without flooding the screen / notebook with too much information.Proposed implementation:
A working implementation has been prepared along with unittests -- a pull request will be added to this issue.
The text was updated successfully, but these errors were encountered: