Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance relabel_from_file to work with any column pair in mapping file #19022

Merged
merged 6 commits into from
Nov 12, 2024

Conversation

wm75
Copy link
Contributor

@wm75 wm75 commented Oct 17, 2024

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

This can save a cut step in WFs.
Will provide tests if you think it's useful.

@github-actions github-actions bot added this to the 24.2 milestone Oct 17, 2024
@jmchilton
Copy link
Member

I don't use Galaxy or follow the latest trends in tool design - so feel free to just dismiss this suggestion if it doesn't reflect how tools are actually used. My gut is though I would implement this as changing:

                <option value="txt">Using lines in a simple text file.</option>
                <option value="tabular">Map original identifiers to new ones using a two column table.</option>

to:

                <option value="txt">Using lines in a simple text file.</option>
                <option value="tabular">Map original identifiers to new ones using the first two columns of a simple table.</option>
                <option value="tabular_configured">Map original identifiers to new ones using arbitrary columns from a table.</option>

And then adding more advanced option in that new selection when - including like comment line skipping and table type options (stuff like break on commas instead of tabs). But that might be overkill.

Either way though - this seems solid and I'm a +1 on it after it has tests and the linting issue is fixed.

@wm75
Copy link
Contributor Author

wm75 commented Oct 21, 2024

And then adding more advanced option in that new selection when - including like comment line skipping and table type options (stuff like break on commas instead of tabs). But that might be overkill.

I hadn't considered such advanced options because, so far, I've only encountered the column issue during WF development.
Now your comment made me think of https://github.com/galaxyproject/iwc/tree/main/workflows/data-fetching/sra-manifest-to-concatenated-fastqs. @lldelisle I seem to remember there was an issue with that one initially about the SRA run data file being csv instead of tab-separated? I think the current patch would already allow simplification of the current WF, but maybe check if there is a missed opportunity for even more here.

@lldelisle
Copy link
Contributor

Hi everyone,
Not sure this would help in this workflow as I need to create a new column anyway...
But this is generally speaking a really good idea.

@@ -1,6 +1,6 @@
<tool id="__RELABEL_FROM_FILE__"
name="Relabel identifiers"
version="1.0.0"
version="1.1.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the versioning of those tools work? Do we need a copy of this file?

@wm75
Copy link
Contributor Author

wm75 commented Nov 11, 2024

So going to try and get this mergeable today for 24.2, but coming back to the versioning question:
is adding the updated version as a copy like in #18385 the way to go here instead of updating the tool in place?

@nsoranzo
Copy link
Member

So going to try and get this mergeable today for 24.2, but coming back to the versioning question: is adding the updated version as a copy like in #18385 the way to go here instead of updating the tool in place?

Yes, this is currently the way for breaking changes.

@wm75
Copy link
Contributor Author

wm75 commented Nov 11, 2024

I don't use Galaxy or follow the latest trends in tool design - so feel free to just dismiss this suggestion if it doesn't reflect how tools are actually used. My gut is though I would implement this as changing:

                <option value="txt">Using lines in a simple text file.</option>
                <option value="tabular">Map original identifiers to new ones using a two column table.</option>

to:

                <option value="txt">Using lines in a simple text file.</option>
                <option value="tabular">Map original identifiers to new ones using the first two columns of a simple table.</option>
                <option value="tabular_configured">Map original identifiers to new ones using arbitrary columns from a table.</option>

So if I go with this part of @jmchilton's suggestion (i.e. hide the new behavior behind a new third select option), that would qualify for a WORKFLOW_SAFE_TOOL_VERSION_UPDATE and not require a new file then, correct?
Because all old workflows could then also run with the new tool version? (If the python code handles all three options correctly, of course)

@wm75
Copy link
Contributor Author

wm75 commented Nov 12, 2024

So, I rewrote this to remain compatibility with the old version of the tool and WFs written with the old version can now be executed without issues also when only the new tool version is installed.

@jmchilton I considered your fancier configuration ideas, but did not implement them here because:

  • specifying header lines to ignore is covered by non-strict mode (although explicit header handling would still be a bit stricter than that)
  • different column separators could be interesting for some use cases, but most other separators are also valid chars in element identifiers, making support for them a questionable general concept. Users who know what they are doing can still run the Cut tool first, which has support for all kinds of input separators and will produce tab-separated output.

@mvdbeek mvdbeek requested a review from jmchilton November 12, 2024 15:19
@jmchilton jmchilton merged commit 6da90b3 into galaxyproject:dev Nov 12, 2024
51 of 53 checks passed
Copy link

This PR was merged without a "kind/" label, please correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants