Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: BWA MEM 2 support #81

Open
wants to merge 1 commit into
base: ReleaseBranch_1.2.73-2
Choose a base branch
from
Open

Conversation

vinjana
Copy link
Member

@vinjana vinjana commented Mar 17, 2022

TODO

  • Unclear how to change the resource requirements. BWA MEM 2 needs more memory and can use more cores (with advantage for the processing time), but that could mean that we have to change the resource sets. Many people actually use the "xl" resource set size for historical reasons. There is no "xxl" set, though.

Summary by CodeRabbit

New Features:

  • Enhanced the BWA module in the analysis tools to automatically infer the module name and binary filename from the BWA version number, improving workflow efficiency.
  • Added a new feature to the "bisulfiteCoreAnalysis" workflow that checks the BWA version and adjusts it to include the "-bisulfite" suffix if necessary, ensuring accurate analysis.
  • Updated the "qcAnalysis" and "exomeAnalysis" workflows to set the BWA binary based on the use of accelerated hardware, enhancing performance.

Refactor:

  • Streamlined the logic for loading the BWA module and setting the BWA binary, making the code more efficient and easier to maintain.

@vinjana vinjana changed the title First adaptations. BWA MEM 2 support Mar 18, 2022
@vinjana vinjana changed the title BWA MEM 2 support Draft: BWA MEM 2 support Mar 29, 2022
@vinjana
Copy link
Member Author

vinjana commented Apr 21, 2022

Next step: Try to convert the now inoperable "accelerated" job, that used to be used for Convey and later Bluebee FPGA implementations of bwa, for BWA-MEM2. This would allow using a dedicated set of resource sets for bwa-mem2.

@vinjana
Copy link
Member Author

vinjana commented Oct 31, 2023

@coderabbitai review

Copy link

coderabbitai bot commented Oct 31, 2023

Walkthrough

The changes primarily focus on enhancing the BWA module's adaptability in the tbi-lsf-cluster.sh script. The modifications allow the script to infer the BWA module name and binary filename from the BWA version number. This update also includes specific logic for different workflow IDs, adjusting the BWA version and binary based on the workflow and hardware acceleration usage.

Changes

File Path Summary
.../qcPipeline/environments/tbi-lsf-cluster.sh The changes introduce logic to infer the BWA module name and binary filename from the BWA version number. For the "bisulfiteCoreAnalysis" workflow, the BWA version is set to include the suffix "-bisulfite", and the BWA module and binary are set accordingly. For the "qcAnalysis" and "exomeAnalysis" workflows, the BWA module and binary are set based on the use of accelerated hardware.
Validation with GitHub issue(Beta)

Assessment of the PR changes against GitHub Issue

Aspect Aligns with the linked issue Explanation
BWA module and binary filenames changes ✔️ The pull request introduces changes related to the BWA module and binary filenames, which aligns with the requirement to update the binary name and module load for bwa-mem2.
Logic to infer the module name and binary filename from the BWA version number ✔️ This change is not explicitly mentioned in the issue, but it seems to be a reasonable approach to support different versions of BWA, including bwa-mem2.
Workflow-specific BWA version and module loading ✔️ The pull request includes workflow-specific logic for setting the BWA version and loading the BWA module, which could be necessary for supporting bwa-mem2 in different workflows.
Support for bwa-mem2 The pull request does not explicitly mention support for bwa-mem2, which is the main requirement in the GitHub issue.
Path to the FASTA file for bwa-mem2 reference genomes The pull request does not mention changes related to the path to the FASTA file for working with bwa-mem2 reference genomes, which is a requirement in the GitHub issue.
 </details><!-- This is an auto-generated comment: raw summary by coderabbit.ai -->

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai help to get help.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.json

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 339f717 and 466e8a5.
Files ignored due to filter (1)
  • resources/configurationFiles/analysisQC.xml
Files selected for processing (1)
  • resources/analysisTools/qcPipeline/environments/tbi-lsf-cluster.sh (1 hunks)
Additional comments: 4
resources/analysisTools/qcPipeline/environments/tbi-lsf-cluster.sh (4)
  • 129-160: The logic for determining the BWA module and binary based on the version number and workflow ID seems sound. However, it's important to ensure that the BWA_VERSION environment variable is always set correctly before this script is run. Also, the script assumes that the BWA version number always starts with a major version of "0" or "2". If BWA releases a version with a different major version number, this script will fail. Consider adding a default case to handle unexpected major version numbers.
* 132-141: + # Try to infer the module name and binary filename from the version number.
+ bwaMajorVersion="$(echo "$BWA_VERSION" | cut -f 1 -d '.')"
+ if [[ "$bwaMajorVersion" == "0" ]]; then
+   export BWA_MODULE=bwa
+ elif [[ "$bwaMajorVersion" == "2" ]]; then
+   export BWA_MODULE=bwa-mem2
+ else
+   echo "Cannot infer BWA_MODULE and BWA_BINARY from BWA_VERSION: '$BWA_VERSION'" >> /dev/stderr
+   # Consider a default case here
+   exit 1
+ fi
  • 143-160: The script checks if the workflow ID is "bisulfiteCoreAnalysis" and if so, it appends a "-bisulfite" suffix to the BWA version. It also checks if the workflow ID is "qcAnalysis" or "exomeAnalysis" and if so, it loads the BWA module and sets the BWA binary based on whether accelerated hardware is being used. This logic seems sound, but it's important to ensure that the workflow ID and the useAcceleratedHardware variable are always set correctly before this script is run.
* 143-160: + if [[ "$WORKFLOW_ID" == "bisulfiteCoreAnalysis" ]]; then
+     if [[ "$bwaMajorVersion" != "1" ]]; then
+       echo "Only bwa-0.* has a bisulfite patch module" >> /dev/stderr
+       exit 1
+     fi
+     ## For bisulfite alignment, we suffix the the value of BINARY_VERSION by '-bisulfite', because that's the name in LSF cluster.
+     export BWA_VERSION="${BWA_VERSION:?BWA_VERSION is not set}-bisulfite"
+     # This works with BWA_MODULE because the bwa-mem2 binary has the same interface.
+     moduleLoad "$BWA_MODULE" BWA_VERSION
+     export BWA_BINARY="$BWA_MODULE"
+ elif [[ "$WORKFLOW_ID" == "qcAnalysis" || "$WORKFLOW_ID" == "exomeAnalysis" ]]; then
+     if [[ "${useAcceleratedHardware:-false}" == false ]]; then
+         moduleLoad "$BWA_MODULE" BWA_VERSION
+         export BWA_BINARY="$BWA_MODULE"
+     elif [[ "${useAcceleratedHardware:-true}" == true ]]; then
+         moduleLoad bwa-bb BWA_VERSION
+         export BWA_ACCELERATED_BINARY=bwa-bb
+     fi
+ fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant