-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrease size of docker image for Ewing's module #884
Comments
After looking into this more there are a few options we can take. For some added context, we currently have two separate workflows that are being run in the Ewing's module for CI with another workflow on the way.
Note that we are going to be adding a workflow to assign and evaluate clusters that will depend on the output from the AUCell workflow and only use R. After running the CNV workflow on a subset of samples we pretty much stopped using CellAssign as the tumor cell assignments didn't make sense. We did see that CopyKAT and InferCNV worked decently a small portion of the time but we were having trouble with it once we expanded to use more samples. Because of this we turned to using Based on this information I think we have two options to consider: Option 1: The easiest option would be to remove running Option 2: We could keep all the workflows as is but create two separate docker images. Because the CNV annotation workflow uses both R and Python we would need to also create two separate Option 3: We could move running the python script for CellAssign out of the Both option 2 and option 3 are more involved than option 1, but also are more reproducible and help maintain everything being run in CI. Because of that I think option 2 would be the best approach. With that approach we keep the workflows the same but create separate images for each workflow. Tagging the science team in case any of you have any thoughts or opinions: @jashapiro @jaclyn-taroni @sjspielman |
I agree we wouldn't want to do option 1. Option 2 and 3 seem more labor-intensive than I was hoping, and therefore, I conclude that this is probably not worth doing right now. What we had to do to use the bigger disk on PRs is not (my) ideal solution, but it is workable. |
Option 2 seems the best one to me. If/when we do this, we might also want to consider adding docs for how to manage/name multiple Dockerfiles. Or, this might be a rare enough occurrence that we prefer to deal with it on a case-by-case basis. That said, I don't think Option 1 is the worst temporary solution until there is more time to fully address this. |
I think the benefit of doing this at some point is figuring out how we want to deal with having multiple images in a module. I agree that having some docs/ rules around this would be helpful to establish! That being said, I also agree that this shouldn't be a priority right now given how much work it probably will be. So maybe this is just something to play around with whenever we have some extra time (or develop another module that needs two docker images, whichever comes first). |
I've made a change for now, but if we could make this smaller, that'd be swell. I prefer how we were handling permissions.
Originally posted by @jaclyn-taroni in #881 (comment)
It's probably a good idea to make the Ewing image smaller so we don't have to use the bigger runner for the GHA.
Currently, we have two versions of the tumor cell annotation workflow. We are only using one of those moving forward. The workflow that we aren't using includes a conda environment, which we don't really need for the other workflow. So I think we could create a smaller image that doesn't have that conda environment. We should go through the code again and double check that this is a reasonable approach.
The text was updated successfully, but these errors were encountered: