Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to add and select converters #89

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

OliverWoolland
Copy link

My summary

The intention of this PR is to (start to) allow runcrate to support executions beyond CWL.

To do this I have added a converter object (base class base.py) from which specialised converters are derived (e.g. cwl.py). I am hoping that by re-implementing the methods in base.py it will be relatively easy for people to add plugins for new languages.

The __init__.py in the converters directory holds a dictionary of available plugins and allow the converter to be selected as a command line option.

Otherwise the changes have been as minimal as possible, with all tests still passing. Only one new test is added as the CLI option defaults to using a CWL converter and as such the tested behaviour has not changed.

Copilot's summary (since it did such a nice job!)

This pull request introduces a new converter option for the CLI tool and refactors the code to support multiple converters. The most important changes include adding the converter option to the CLI, creating a base converter class, and updating tests to accommodate the new converter functionality.

CLI Enhancements:

  • src/runcrate/cli.py: Added a new --converter option to the CLI, allowing users to select from available converters. Updated the convert function to use the specified converter. [1] [2]

Converter Implementation:

Constants Update:

Testing Enhancements:

  • tests/test_cli.py: Added a new test test_cli_convert_with_cwl_converter_set_explictly to verify the CLI functionality with the cwl converter.
  • tests/test_step_mapping.py: Updated tests to use the cwlConverter and added a pytest fixture for the converter. [1] [2]

Copy link

codecov bot commented Dec 11, 2024

Codecov Report

Attention: Patch coverage is 98.24047% with 12 lines in your changes missing coverage. Please review.

Project coverage is 97.80%. Comparing base (fd67bd9) to head (5985eef).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
src/runcrate/converters/cwl.py 97.89% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #89      +/-   ##
==========================================
+ Coverage   97.72%   97.80%   +0.07%     
==========================================
  Files          13       16       +3     
  Lines        2066     2139      +73     
==========================================
+ Hits         2019     2092      +73     
  Misses         47       47              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/runcrate/cli.py Outdated Show resolved Hide resolved
src/runcrate/convert.py Outdated Show resolved Hide resolved
@simleo
Copy link
Collaborator

simleo commented Dec 11, 2024

I left a few comments on the code, more observations here:

  • Do you have another use case for the converter, or is this just preparing for something that might never happen? This is quite important since this is a massive refactoring that makes the code harder to understand.
  • The base converter class should be an abstract class (using the abc module) that is never instantiated. Instead, the base converter is instantiated and made available to the API and the CLI, despite being unusable:
runcrate convert -c base -o foo tests/data/revsort-run-1/
[...]
AttributeError: 'converter' object has no attribute 'WORKFLOW_BASENAME'
  • There are only two common functions in the base class, not enough to justify the base converter + specialized converter layout.

  • The CLI still reflects the old CWLProv-only implementation, with help messages like Convert a CWLProv RO bundle into a Workflow Run RO-Crate, RO_DIR: top-level directory of the CWLProv RO etc.

Another use case is needed before attempting something like this, and even then we'd need more common code to justify a base class. Moreover, the base class should fully document all the abstract methods so that a potential implementer of a new converter would know what's expected of each method, and each method should make sense in general, not just in CWLProv.

@OliverWoolland
Copy link
Author

Thanks @simleo!

Do you have another use case for the converter?

Yes, I am actively working on a converter for Galaxy output, collaborators are interested in Snakemake and Airflow

The base converter class should be an abstract class

Makes sense to me! I'll get that done

The CLI still reflects the old CWLProv-only implementation,

True, I'll go hunting for relevant things

Moreover, the base class should fully document all the abstract methods so that a potential implementer of a new converter would know what's expected of each method, and each method should make sense in general, not just in CWLProv.

Absolutely, I'll do my best but I'll have to approach it iteratively

@simleo
Copy link
Collaborator

simleo commented Dec 11, 2024

I am actively working on a converter for Galaxy output

You know that Galaxy has its own Workflow Run RO-Crate implementation? You can download the RO-Crate for a workflow execution from the workflow invocations menu.

@OliverWoolland
Copy link
Author

I am aware! And keeping an eye on development efforts too

@OliverWoolland
Copy link
Author

I am aware! And keeping an eye on development efforts too

To elaborate on this, I notice that their implementation borrows heavily from this library! And it seems undesirable to me to have lots of informal spins of this where things cannot be contributed back

As such I feel quite motivated to move the Galaxy implementation towards using a refactored version of this

@OliverWoolland
Copy link
Author

I think the last few changes address most of your points @simleo :)

I'm going to keep going with a galaxy based example of extension and see how I go. But I'll be taking a break until the new year before that gets finished!

Happy holidays and thanks again for the feedback

@simleo
Copy link
Collaborator

simleo commented Dec 16, 2024

I think the last few changes address most of your points @simleo :)

It's looking better now. Some refs to CWL-specific stuff are still present in the now "general" part, e.g. the docstring of the convert.py module.

One thing I don't understand is the docstring for Converter.get_workflow, especially when it says "The definition should contain...": what does "definition" mean in this context? Why should it contain those items, and in what form? Does this really apply to all converters?

Also, it's a bit odd that from the CONVERTERS dict you get a converter that is not functional until you call populate. It's like populate is the real __init__, which can be surprising.

I'm going to keep going with a galaxy based example of extension and see how I go.

I think this will help a lot in driving the generalization process forward. E.g. what does "provenance bundle" mean in the Galaxy case?

But I'll be taking a break until the new year before that gets finished!
Happy holidays and thanks again for the feedback

Happy holidays to you as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants