Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch (parallelized) calculations of beta Shapley #427

Closed
kosmitive opened this issue Sep 5, 2023 · 8 comments · Fixed by #428
Closed

Batch (parallelized) calculations of beta Shapley #427

kosmitive opened this issue Sep 5, 2023 · 8 comments · Fixed by #428
Assignees
Labels
enhancement New feature or request light task A light task
Milestone

Comments

@kosmitive
Copy link
Contributor

Fine-grained marginal evaluations (used in our implementation of Beta Shapley) in combination with a parallel executor has sub-optimal performance. As a result it uses only a fraction of the computing power. I like the way Beta Shapley is implemented with the breakdown to marginal computations. To maintain that structure we should think about batching the marginal evaluations to increase performance. Obviously the optimal value depends on the used cores, but a batch size in between 30 and 50 can be chosen as a first guess.

@kosmitive kosmitive added enhancement New feature or request light task A light task design-problem Problems with internal architecture and removed design-problem Problems with internal architecture labels Sep 5, 2023
@kosmitive kosmitive linked a pull request Sep 5, 2023 that will close this issue
3 tasks
@kosmitive kosmitive self-assigned this Sep 5, 2023
@mdbenito
Copy link
Collaborator

mdbenito commented Sep 6, 2023

Note that batch size will change the behaviour of algorithms because convergence criteria are checked in the main process. The most obvious case is maxchecks but all other stopping criteria will be evaluated only every batch_size samples.

The issue is with models that are trained very quickly or sample sizes that are too small. I would have expected that using a process pool would alleviate the problem. Is joblib doing this?

Anyway, the proper solution is IMO not to batch but to use queues like we did a long time ago.

And if we decided to use batches as a stopgap, then it might make sense to batch by total size of all samples sent, instead of their number, because training on 5 points might be faster than training on 5000. I'm not sure this is worth it, though, because we are going to break the batch_size interface once we implement queues and that would be bad.

@kosmitive
Copy link
Contributor Author

kosmitive commented Sep 6, 2023

Yes but each sample of the batch is processed in order and after each sample done is checked. If done is true in the middle of the batch, the rest of the calculations get thrown away. The PR comes in handy for the current experiments to speed things up (e.g. by a factor of 14 for logistic regression). We don't have to merge it, in case we rely on a different parallelization scheme.

My favorite here would be also task queues, although ProcessPoolExecutor probably has a task queue under the hood and as far as I am concerned the loky backend manges a ProcessPoolExecutor internally.

@mdbenito
Copy link
Collaborator

mdbenito commented Sep 7, 2023

Come to think of it, process pools have running processes but they start afresh with each new task, and this can take too long. They cannot be using a task queue like I meant, since they couldn’t guarantee a clean slate if they did. We almost certainly want our own queues.

As to your solution, given the massive speed up you mention, we can add it if we clearly document that the interface is temporary and will be deprecated. It is crucial to ensure that randomness is handled properly, though.

What do you think @AnesBenmerzoug ?

@AnesBenmerzoug
Copy link
Collaborator

@mdbenito I agree with what you said. Given the speed up that this brings I also think we should merge it.

We can go further in this direction by automatically finding the right batch number (This is how it is done in joblib, for example) or we can consider once again using queues.

@mdbenito mdbenito added this to the v0.7.1 milestone Sep 17, 2023
@mdbenito
Copy link
Collaborator

We can go further in this direction by automatically finding the right batch number (This is how it is done in joblib, for example) or we can consider once again using queues.

With the current idiom for job submission, timing would probably have to happen within the caller and not the sampler, since we take a bunch of samples in one go and submit the jobs.

@mdbenito
Copy link
Collaborator

mdbenito commented Sep 17, 2023

@kosmitive I'm going to merge your PR (after some changes to documentation and deprecation notices that I've pushed), but I think batching belongs in the samplers. A simple way to do it is to wrap them in a generic batched decorator like this one I wrote as a demo. I can add it some other time if we decide to keep the batching mechanism

@kosmitive
Copy link
Contributor Author

kosmitive commented Sep 17, 2023

@kosmitive I'm going to merge your PR (after some changes to documentation and deprecation notices that I've pushed), but I think batching belongs in the samplers. A simple way to do it is to wrap them in a generic batched decorator like this one I wrote as a demo. I can add it some other time if we decide to keep the batching mechanism

Okay when I think about the sampler is indeed the clearer location. Furthermore the decorator is way cleaner. But if we remove it latter it might be not the best to add it so deeply in the framework. I leave the MR for editing and merging to you @mdbenito. If you need some work from my side, let me know.

@AnesBenmerzoug We could transfer it indirectly to the called processes, by introducing a stop queue, which also serves the speed recommendation of each worker.

@mdbenito
Copy link
Collaborator

Okay when I think about the sampler is indeed the clearer location. Furthermore the decorator is way cleaner. But if we remove it latter it might be not the best to add it so deeply in the framework.

Good point. Although it only shifts the convention for samplers to return lists of samples instead of single items, which is not so bad. Whether or not using a decorator is good is not so clear. For instance we don't have explicit type information about the batched samplers and would have to create a protocol or maybe wrap the samplers at runtime, although yuk... Anyway, we can think about that some other time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request light task A light task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants