Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement minimap2-based Hi-C alignments #97

Closed
muffato opened this issue Jul 15, 2024 · 4 comments · Fixed by #113
Closed

Implement minimap2-based Hi-C alignments #97

muffato opened this issue Jul 15, 2024 · 4 comments · Fixed by #113
Assignees
Labels
enhancement Improvement of the existing features

Comments

@muffato
Copy link
Member

muffato commented Jul 15, 2024

bwa-mem2 can't deal with extra-large genomes. Instead, we can use minimap2 like in the Treeval and genomeassembly pipelines, cf sanger-tol/genomeassembly#44

@muffato muffato added the enhancement Improvement of the existing features label Jul 15, 2024
@reichan1998
Copy link
Contributor

reichan1998 commented Aug 30, 2024

Hi @tkchafin,

While adapting hic_mapping.nf from Treeval, I encountered some issues:

  1. The command for creating containers using local ncontainers=$(zcat "${realcrai}" | wc -l) returned 0, which caused the chunking to fail (source code). I checked with the output .crai file generated from our pipeline, but it still returned 0.
    As a workaround, I changed the line to local ncontainers=$(cat "${realcram}" | wc -l), which worked, but I'm still concerned about the correct usage of CRAM and CRAI files in this context.

  2. Using the original input Hi-C CRAM file resulted in the error Error opening CRAM file '35528_2%231.subset.cram' in the process CRAM_FILTER_MINIMAP2_FILTER5END_FIXMATE_SORT. I suspect this is due to missing targets in the header. My attempted fix was to convert the original CRAM using Samtools view:

samtools  view  --threads 1  --reference GCA_922984935.2.subset.unmasked.fasta  -C --output-fmt cram --write-index  -o mMelMel3_T1.cram  35528_2%231.subset.cram 

When checking the original CRAM file and the converted file with samtools view -c, both returned the value 19984. However, when comparing to the output CRAM before adding the subworkflow, the results differed.

Before modification (from our latest dev branch):

samtools view -c /Users/aaaz/sanger-tol/readmapping_3/test/read_mapping/hic/GCA_922984935.2.subset.unmasked.hic.mMelMel3.cram
10125

After modification:

samtools view -c /Users/aaaz/sanger-tol/readmapping_3/test_hic_minimap2/read_mapping/hic/GCA_922984935.2.subset.unmasked.hic.mMelMel3.cram
19984

Given these discrepancies, I’m not entirely confident that my attempted fix is correct. I’d really appreciate your advice on this. If you’d like to take a closer look, my latest commit should have all the details. Sorry for the long message—I just wanted to make sure I covered everything!

@tkchafin
Copy link
Contributor

tkchafin commented Sep 2, 2024

Hi @reichan1998 I am investigating it now. The first issue I have found is that our input file is NOT actually a CRAM file. It is a SAM file named with the incorrect extension... I've uploaded corrected files to the remote repository.

@tkchafin
Copy link
Contributor

tkchafin commented Sep 2, 2024

Sent a PR (reichan1998#1) to your cram_handling branch with some fixes, in addition to correcting the test file:

  1. Switched to dedicated nf-core samtools/index module (+changes in modules.config)
  2. Changed a bit how the channels are constructed in align_short_hic.nf
  3. Changed input declaration for generate_cram_csv.nf module

I have steps after CSV generation commented out for now, but that at least seems to be working.

@reichan1998
Copy link
Contributor

Thank you so much for investigating and fixing the issues! I see now why things were not working—no wonder with a SAM file masquerading as a CRAM! 😅
I have checked your code, and everything is fine now. I am going to proceed with the next steps. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement of the existing features
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants