-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch (parallelized) calculations of beta Shapley #427
Comments
Note that batch size will change the behaviour of algorithms because convergence criteria are checked in the main process. The most obvious case is maxchecks but all other stopping criteria will be evaluated only every batch_size samples. The issue is with models that are trained very quickly or sample sizes that are too small. I would have expected that using a process pool would alleviate the problem. Is joblib doing this? Anyway, the proper solution is IMO not to batch but to use queues like we did a long time ago. And if we decided to use batches as a stopgap, then it might make sense to batch by total size of all samples sent, instead of their number, because training on 5 points might be faster than training on 5000. I'm not sure this is worth it, though, because we are going to break the batch_size interface once we implement queues and that would be bad. |
Yes but each sample of the batch is processed in order and after each sample done is checked. If done is true in the middle of the batch, the rest of the calculations get thrown away. The PR comes in handy for the current experiments to speed things up (e.g. by a factor of 14 for logistic regression). We don't have to merge it, in case we rely on a different parallelization scheme. My favorite here would be also task queues, although ProcessPoolExecutor probably has a task queue under the hood and as far as I am concerned the loky backend manges a ProcessPoolExecutor internally. |
Come to think of it, process pools have running processes but they start afresh with each new task, and this can take too long. They cannot be using a task queue like I meant, since they couldn’t guarantee a clean slate if they did. We almost certainly want our own queues. As to your solution, given the massive speed up you mention, we can add it if we clearly document that the interface is temporary and will be deprecated. It is crucial to ensure that randomness is handled properly, though. What do you think @AnesBenmerzoug ? |
With the current idiom for job submission, timing would probably have to happen within the caller and not the sampler, since we take a bunch of samples in one go and submit the jobs. |
@kosmitive I'm going to merge your PR (after some changes to documentation and deprecation notices that I've pushed), but I think batching belongs in the samplers. A simple way to do it is to wrap them in a generic |
Okay when I think about the sampler is indeed the clearer location. Furthermore the decorator is way cleaner. But if we remove it latter it might be not the best to add it so deeply in the framework. I leave the MR for editing and merging to you @mdbenito. If you need some work from my side, let me know. @AnesBenmerzoug We could transfer it indirectly to the called processes, by introducing a stop queue, which also serves the speed recommendation of each worker. |
Good point. Although it only shifts the convention for samplers to return lists of samples instead of single items, which is not so bad. Whether or not using a decorator is good is not so clear. For instance we don't have explicit type information about the batched samplers and would have to create a protocol or maybe wrap the samplers at runtime, although yuk... Anyway, we can think about that some other time. |
Fine-grained marginal evaluations (used in our implementation of Beta Shapley) in combination with a parallel executor has sub-optimal performance. As a result it uses only a fraction of the computing power. I like the way Beta Shapley is implemented with the breakdown to marginal computations. To maintain that structure we should think about batching the marginal evaluations to increase performance. Obviously the optimal value depends on the used cores, but a batch size in between 30 and 50 can be chosen as a first guess.
The text was updated successfully, but these errors were encountered: