Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merquryfk bug #157

Open
muffato opened this issue Dec 6, 2024 · 3 comments
Open

Merquryfk bug #157

muffato opened this issue Dec 6, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@muffato
Copy link
Member

muffato commented Dec 6, 2024

Description of the bug

Taking /lustre/scratch122/tol/data/5/a/3/5/2/0/Ypsolopha_sequella (/lustre/scratch124/tol/projects/darwin/data/insects/Ypsolopha_sequella) as an example, and using the database genomic_data/ilYpsSequ2/pacbio/kmer/k31/ilYpsSequ2.k31.ktab

  • completeness of assembly/release/ilYpsSequ2.1/insdc/GCA_934047225.1.fasta.gz:
    Assembly        Region  Found   Total   % Covered
    ilYpsSequ2.1.GCA_934047225.1    all     659008770       659016853       100.00
    
  • completeness of assembly/release/ilYpsSequ2.1_alternate_haplotype/insdc/GCA_934041175.1.fasta.gz:
    Assembly        Region  Found   Total   % Covered
    ilYpsSequ2.1_alternate_haplotype.GCA_934041175.1        all     659008726       659016853       100.00
    
  • expected completeness, which can be seen if passing both Fasta files to merquryfk:
    Assembly        Region  Found   Total   % Covered
    ilYpsSequ2.1.GCA_934047225.1    all     524725644       659016853       79.62
    ilYpsSequ2.1_alternate_haplotype.GCA_934041175.1        all     513343328       659016853       77.90
    both    all     644539455       659016853       97.80
    

The bug is in merquryfk, not in the pipeline itself, but until it's solved, the completeness value may not be trusted.

Command used and terminal output

In a directory, create symbolic links to:

  • /lustre/scratch122/tol/data/5/a/3/5/2/0/Ypsolopha_sequella/genomic_data/ilYpsSequ2/pacbio/kmer/k31/ilYpsSequ2.{hist,ktab}
  • /lustre/scratch122/tol/data/5/a/3/5/2/0/Ypsolopha_sequella/genomic_data/ilYpsSequ2/pacbio/kmer/k31/.ilYpsSequ2.ktab.*
  • /lustre/scratch122/tol/data/5/a/3/5/2/0/Ypsolopha_sequella/assembly/release/ilYpsSequ2.1/insdc/GCA_934047225.1.fasta.gz
  • /lustre/scratch122/tol/data/5/a/3/5/2/0/Ypsolopha_sequella/assembly/release/ilYpsSequ2.1_alternate_haplotype/insdc/GCA_934041175.1.fasta.gz

The command is then:

bsub -M24000 -R"select[mem>24000] rusage[mem=24000] span[hosts=1]" -n 6 -q yesterday -Is \
singularity exec --no-home --pid -B /lustre /nfs/treeoflife-01/teams/shared/nextflow/cache/nxf_singularity/quay.io-sanger-tol-fastk-1.0.1-c1.img \
MerquryFK -P. -T6 $PWD/ilYpsSequ2.k31.ktab \
$PWD/ilYpsSequ2.1.GCA_934047225.1.fasta  $PWD/principal

or

(...)
$PWD/ilYpsSequ2.1_alternate_haplotype.GCA_934041175.1.fasta.gz $PWD/alternate

or

(...)
$PWD/ilYpsSequ2.1.GCA_934047225.1.fasta  $PWD/ilYpsSequ2.1_alternate_haplotype.GCA_934041175.1.fasta.gz $PWD/both

Relevant files

No response

System information

No response

@muffato muffato added the bug Something isn't working label Dec 6, 2024
@muffato
Copy link
Member Author

muffato commented Dec 6, 2024

A workaround is to alter the MerquryFK command line to pass the input Fasta twice.

Assembly        Region  Found   Total   % Covered
ilYpsSequ2.1.GCA_934047225.1    all     524725644       659016853       79.62
ilYpsSequ2.1.GCA_934047225.1    all     524725644       659016853       79.62
both    all     524725644       659016853       79.62

@tkchafin
Copy link
Contributor

tkchafin commented Dec 9, 2024

Do we want this fixed in 2.1.0? I could patch the module this afternoon, otherwise there's nothing else we are currently waiting on for that release

@tkchafin
Copy link
Contributor

Logging for posterity that in 2.1.0 we will remove the completeness metric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

2 participants