Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hack that allows specifying the user directory as git #649

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PGijsbers
Copy link
Collaborator

@PGijsbers PGijsbers commented Nov 17, 2024

Minimal hack to easily define frameworks and configurations remotely (e.g., on a GitHub repository).
The repository must have all the files you would expect for adding a custom framework locally, like this one. With the hack we just identify if the custom directory is a git repository, and if so clone it first, then set the local directory as the user directory.

python runbenchmark.py ExplainableBoostingMachine -u https://github.com/pgijsbers/amlb-template.git

@Innixma @eddiebergman Where do you think we should go from here? I think a few things might be nice to have:

  • specifying where repositories should be cloned to through a configuration option (instead of "downloads").
  • automatically pulling the latest changes from the remote
  • allowing branches/commits to be specified
  • some metadata file to communicate compatibility with different AMLB versions (not quite necessary yet, but I hope to revise some of the requirements in the future as e.g., Defaults to improve framework integration setup #642, Define abstractions for framework integration #279, some of which require changes to the setup scripts). Would be better to error out early.
  • specify multiple user directories (e.g., you might have a framework integration in one, and some definitions in another)
  • add information to the logs and results file on which framework integration was used.
  • externalize most (if not all?) frameworks from this repository (Should we externalize framework integrations? #571)
  • modifying the frameworks.yaml to allow us to specify integration scripts of frameworks, so that you can python runbenchmark.py FRAMEWORK and it can then automatically fetch the integration for FRAMEWORK if it is one of the "trusted" predefined ones in the shipped frameworks.yaml?
  • allowing just a subdirectory in a repository to define the integration1
  • ssh cloning

Are there any additions or concerns that come to mind? Love to hear your thoughts.

Footnotes

  1. This was a request from.. either of you, I think. I am personally not convinced yet, as this will also lead up to much slower setup times. Especially if you may need to fetch multiple commits when navigating to a compatible integration.

@PGijsbers PGijsbers added the enhancement New feature or request label Nov 17, 2024
@eddiebergman
Copy link
Collaborator

eddiebergman commented Nov 19, 2024

Heyo,

I definitely mentioned [1] as something that would be a nice feature to have for benchmarking other peoples algorithms, as they intend their setup to be.

I'm on board with most of the features with some additional commentary.

I think one of the priorities would be the specifying the commit-sha, optionally even having a recommended one in the framework's own definition (i.e. "if benchmarking, and you want the latest use this version for now while we work on main"). As for the hack of setting it to the user-directory, that kind of works but ultimately the goal of having the .amlb-setup in a repo that other can re-use is just to have the framework definition as well as setup. The actual datasets and runtime configurations would not be needed and would be defined by the user actually benchmarking the tool with the .amlb-setup (or whatever it would be called).

As for the slower setup times, I'm not that concerned as usually:

  1. Once the framework is setup once, it usually does not need to be setup again, or at least rarely.
  2. The setup time for downloading the datasets usually dominates.

@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Nov 20, 2024

Thanks for your thoughts! While specifying specific repo versions with commit hashes remains useful/important, I would say the

"if benchmarking, and you want the latest use this version for now while we work on main"

is what a stable branch or otherwise releases should be for. In practice, I don't think people would want to update commit hashes every time they decide something is stable.

For the subdirectory, other people could also have two repositories: one with the framework, one with the integration. Though I guess that also requires some additional work to ensure the repositories are in sync by e.g., add it in CI or introducing some versioning constraints in the integration repo.

ther can re-use is just to have the framework definition as well as setup

Yes, so either the framework integration/definition would be either its own option, or we could allow for arbitrary many user dirs to be defined, where all configs files are simply merged in the process (as it's currently merging predefined files with the user files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants