ValueError: Missing variants frozenset #393

Jorisvansteenbrugge · 2024-11-29T12:15:48Z

Description of the bug

When running the pipeline with --run_ancestry an error occurs during process 'PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE (reference) , I will have the command error listed below, but the error message is in

File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/calc/cli/aggregate_cli.py", line 50, in verify_variants
      raise ValueError(f"Missing variants {diff}")
  ValueError: Missing variants frozenset({'5:53383709:C:CAA', '4:125831837:A:AATAT'})

I have tested the pipeline before with another dataset, which did not result in the error. Similarly, running the pipeline on the current dataset without ancestry analysis does not result in the error.

Do you have any idea what might be causing this issue?

Command used and terminal output

$  nextflow run pgsc_calc/main.nf -c slurm.config -profile slurm --input /path/to/output_samplesheet.csv --pgs_id PGS000004 --target_build GRCh38 --outdir testOutdir --keep_multiallelic true --keep_ambiguous true --run_ancestry /path/to/pgsc_HGDP+1kGP_v1.tar.zst


Command error:

  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  pgscatalog.calc.cli.aggregate_cli: 2024-11-29 12:26:33 INFO     Checking variant overlap
  pgscatalog.calc.cli.aggregate_cli: 2024-11-29 12:26:33 INFO     Read 295 from reference_ALL_additive_0.sscore.vars
  pgscatalog.calc.cli.aggregate_cli: 2024-11-29 12:26:33 INFO     Read 297 from reference_ALL_additive_0.scorefile.gz
  Traceback (most recent call last):
    File "/app/pgscatalog.utils/.venv/bin/pgscatalog-aggregate", line 8, in <module>
      sys.exit(run_aggregate())
               ^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/calc/cli/aggregate_cli.py", line 76, in run_aggregate
      [verify_variants(x) for x in score_paths]
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/calc/cli/aggregate_cli.py", line 76, in <listcomp>
      [verify_variants(x) for x in score_paths]
       ^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/calc/cli/aggregate_cli.py", line 50, in verify_variants
      raise ValueError(f"Missing variants {diff}")
  ValueError: Missing variants frozenset({'5:53383709:C:CAA', '4:125831837:A:AATAT'})

Work dir:
  /hpc/diaggen/users/joris/PRS_data/pgs_calc/work/56/e9595f52fdfeef44e9b06bf84af104

Container:
  /hpc/diaggen/software/singularity_cache/ghcr.io-pgscatalog-pygscatalog-pgscatalog-utils-1.4.4-singularity.img



### Relevant files

[nextflow.log](https://github.com/user-attachments/files/17959011/nextflow.log)


### System information

Nextflow version:  24.10.1
HPC
slurm
singularity
Rocky 8.10

The text was updated successfully, but these errors were encountered:

nebfield · 2024-12-02T11:57:19Z

Thanks for the bug report!

This error happens when the variants that are output by the pgscatalog-match process don't perfectly match the variants used by plink to calculate the scores. We always want to make sure these two variant sets are perfectly consistent.

It's interesting this happens with the reference panel. I think it has something to do with the matching parameters --keep_multiallelic and --keep_ambiguous (which are both usually false). Does the error still happen if you remove these parameters?

Jorisvansteenbrugge · 2024-12-05T13:15:52Z

Thank you for taking a look!

The pipeline does run successfully without --keep_multiallelic and --keep_ambiguous set to true. I enabled these two settings to get a slightly higher match rate, as I was testing the pipeline with a small sampleset :)

nebfield · 2024-12-05T15:35:54Z

Great 🚀 that's helpful, thank you. I'll leave this issue open to investigate properly and fix in our next release - but that probably won't be until early next year sometime 😅

Jorisvansteenbrugge added the bug Something isn't working label Nov 29, 2024

nebfield added this to the v2.1.0 milestone Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Missing variants frozenset #393

ValueError: Missing variants frozenset #393

Jorisvansteenbrugge commented Nov 29, 2024 •

edited

Loading

nebfield commented Dec 2, 2024

Jorisvansteenbrugge commented Dec 5, 2024

nebfield commented Dec 5, 2024

ValueError: Missing variants frozenset #393

ValueError: Missing variants frozenset #393

Comments

Jorisvansteenbrugge commented Nov 29, 2024 • edited Loading

Description of the bug

Command used and terminal output

nebfield commented Dec 2, 2024

Jorisvansteenbrugge commented Dec 5, 2024

nebfield commented Dec 5, 2024

Jorisvansteenbrugge commented Nov 29, 2024 •

edited

Loading