Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource optimisation #92

Merged
merged 19 commits into from
Nov 24, 2023
Merged

Resource optimisation #92

merged 19 commits into from
Nov 24, 2023

Conversation

muffato
Copy link
Member

@muffato muffato commented Nov 16, 2023

I merged #90 by accident, so reopening a new PR.

Closes #16, #18, #20, #91

Like in sanger-tol/readmapping#82 the goal is to stop using the process_* labels and instead optimise the resource requests of every process.

I'm using the same dataset: 10 genomes of increasing size, with 1 Hi-C library and 1 PacBio library each.

  Fasta size (bytes) PacBio size (# reads) Hi-C (# reads)
GCA_939531405.1 13,824,461 1,546,435 955,654,834
GCA_937625935.1 26,683,271 189,202 980,890,138
GCA_951394315.1 58,010,196 1,965,084 704,258,466
GCA_947172415.1 118,858,594 799,796 87,833,110
GCA_910589235.2 232,212,321 1,586,931 727,465,652
GCA_949987625.1 417,566,504 2,211,570 705,705,280
GCA_946406115.1 810,357,340 1,872,695 842,629,084
GCA_963513935.1 1,803,897,959 7,338,871 3,305,634,916
GCA_951213105.1 3,609,437,155 1,121,856 3,127,898,040
GCA_946902985.2 9,152,113,672 1,537,548 886,707,886

I found much less correlation than in the read-mapping pipeline. The only input size that I found useful was the genome size, now collected at the beginning of the pipeline and added to the meta map. There is some correlation between the number of Hi-C reads and some process runtime, but not memory usage. Since runtime estimates don't need to be very accurate (really, it's only normal/long/week that matters), I don't even pull that input size.

I am using helper functions to grow values (like the number of CPUs) in a logarithm fashion. In effect, this is to limit the increase of the number of CPUs, especially as the advantage of multi-threading tends to decrease with a higher number of threads.

Also:

  1. I could replace the GrabFiles process with some Groovy magic, as per https://community.seqera.io/t/is-it-bad-practice-to-try-and-pluck-files-from-an-element-in-a-channel-that-is-a-directory-with-channel-manipulation/224/2 . This saves 1 LSF job.
  2. I adjusted the GNU_SORT parameters to fix Leftover files in /tmp from the sort commands #91

In this PR, the new resource requirements make every process succeed at the first attempt. The formulas are the lowest legible-ish correlations I could find.

Metric Before After Improvement
Total memory requested (GB) 3,660.0 997.1 ÷3.7
Memory efficiency (used/requested, %) 21.2 78.0
Total memory reservation (GB-hours) 2,792.4 2,405.4 ÷1.2
Memory reservation efficiency (used/requested, %) 86.0 89.4
Total CPUs requested (n) 610.0 510.0 ÷1.2
CPU efficiency (used/requested, %) 47.6 60.5
Total CPU reservation (CPU-hours) 465.4 332.0 ÷1.4
CPU reservation efficiency (used/requested, %) 62.0 86.3
Job failures (%) 0.5 0.0

Detailed charts showing the memory/CPU/time used/requested for every process: before (PDF), after (PDF)

If we want to tolerate processes failing at the first attempt, being resubmitted once or twice to finally complete, I'm sure some requirements may be lowered even more. We would have to make sure that the resources wasted on those first attempts doesn't outweigh the savings we'll make on other processes. Something to investigate later...

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@muffato muffato added the enhancement Improvement of the existing features label Nov 16, 2023
@muffato muffato added this to the 1.1.0 milestone Nov 16, 2023
@muffato muffato self-assigned this Nov 16, 2023
@muffato muffato marked this pull request as ready for review November 16, 2023 21:43
Copy link

github-actions bot commented Nov 16, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit a5d21f4

+| ✅ 132 tests passed       |+
#| ❔  20 tests were ignored |#
!| ❗   1 tests had warnings |!

❗ Test warnings:

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-genomenote_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomenote_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomenote_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: conf/igenomes.config
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/config.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File does not exist: assets/nf-core-genomenote_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomenote_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomenote_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • actions_ci - actions_ci
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/genomenote/genomenote/.github/workflows/awstest.yml

✅ Tests passed:

Run details

  • nf-core/tools version 2.8
  • Run at 2023-11-23 09:25:44

@muffato
Copy link
Member Author

muffato commented Nov 17, 2023

Now that I've generated all the charts, I realise that some resource requirements are actually too low ! I should be asking 150 MB for MultiQC, not 50 MB. I guess it worked because the jobs are too fast for MEMLIMIT to have time kill.

Some COOLER_ZOOMIFY processes also take more than the 12 GB I'm requesting. The processes take 10 min, so I thought MEMLIMIT would kick in ? Anyway, I'll sort all those things out in another commit

@muffato muffato linked an issue Nov 20, 2023 that may be closed by this pull request
@muffato muffato modified the milestones: 1.1.0, 2.0.0, 1.2.0 Nov 20, 2023
@muffato muffato linked an issue Nov 20, 2023 that may be closed by this pull request
This was linked to issues Nov 20, 2023
Copy link
Collaborator

@BethYates BethYates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes look good to me, the groovy code to replace the GrabFiles process is a nice improvement

muffato added a commit that referenced this pull request Nov 23, 2023
@muffato
Copy link
Member Author

muffato commented Nov 23, 2023

I've made the few changes I mentioned in #92 : just updated up / down some requirements. I reran the pipeline on all species and it worked fine.

I also merged the dev branch in to solve the conflict coming from #93

@BethYates : this PR just needs an approval and then I can merge it

@muffato muffato mentioned this pull request Nov 23, 2023
9 tasks
@muffato muffato merged commit 48f7738 into dev Nov 24, 2023
6 checks passed
@muffato muffato deleted the resource_optimisation branch November 24, 2023 10:44
@muffato muffato modified the milestones: 1.2.0, 1.1.0 Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement of the existing features
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Leftover files in /tmp from the sort commands Large test Medium test Small test
2 participants