Pipeline Interface for Prediction tools #204
Replies: 7 comments 5 replies
-
Here my draft how the mhc-binding structure could look like:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for sharing the draft. A couple of thoughts on the information you provided: In my opinion, the most important part is the one on the prediction methods and we should have (nf-core) modules for them. Local modules are fine for me as a start.
These prediction tool modules e.g. can also be developed completely independent of the other functionality. I am still not sure about the added benefit of using
when we have allele-handling functionality provided by Regarding the output, I assume you mean the format as we have it now? I think we should aim for that to not have different output formats. In general, I still see the difficulties when everything is done "string/table"-based, especially with respect to maintaing metadata. However, for peptides it's a smaller issue than for variants and proteins. |
Beta Was this translation helpful? Give feedback.
-
Not sure if we need a subworkflow for two module calls more or less but it's okay for me.
For new methods (with new requirements) I see your point. My guess would be though that we do not have that much variety when it comes to the required allele notation.
Then I did not understand it correctly. :) |
Beta Was this translation helpful? Give feedback.
-
Ok, so if you also think that is not an issue, we could try mhcgnomes for the draft? It is quite handy when it comes to different conversions for class 2 (Mouse I did not check in detail, but is also supported). It also comes with pandas in the container 🙌🏼
Ad-hoc use case is that we have in mass spec data, where each peptide is annotated with >20 columns of search scores and quality measurements of each peptide. I think wide-format can be a bit ugly yes but is a bit easier to read because you don't need to consider x rows for x predictors to understand what is a binder over all predictors (imo). You could also easily compute summary columns (e.g. consensus of all given predictors). |
Beta Was this translation helpful? Give feedback.
-
✅ but for me as well an independent development unit that is not necessarily bound to the story of having the prediction tool interface within the pipeline. So this could be e.g. done in a separate task/PR in order to keep the number of changes/newly introduced features in a single PR lower.
I think we even changed it back (to the long format) based on feedback from people that were doing the peptide selection in Excel. My suggestion would be as you also mentioned earlier to have a switch which one can use to switch to wide format ( |
Beta Was this translation helpful? Give feedback.
-
✅ Maybe a quick survey in the nf-core slack channel might help defining the default one?
Ah yes sorry! Another cosmetic request / suggestion I would throw out here to reduce the number of (unnecessary) columns: Should we keep the boolean "binder" column? Also: Should we keep the affinities in the final output since everyone is using the rank metric since quite some time? If yes, we could keep save the affinity scores in the module output of each predictor but the final harmonised results doesn't have. That might be resolve overwhelming details for the user and if they are still interested in the affinities they could look up the "raw" prediction result |
Beta Was this translation helpful? Give feedback.
-
Sure, one vote per lab? 😄
Hm, I currently don't see any problem with the |
Beta Was this translation helpful? Give feedback.
-
This discussion is about a potential interface for prediction tools within the pipeline and the associated changes, such as associated modules, that would be necessary for this change.
Beta Was this translation helpful? Give feedback.
All reactions