You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the pipeline with --run_ancestry an error occurs during process 'PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE (reference) , I will have the command error listed below, but the error message is in
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/calc/cli/aggregate_cli.py", line 50, in verify_variants
raise ValueError(f"Missing variants {diff}")
ValueError: Missing variants frozenset({'5:53383709:C:CAA', '4:125831837:A:AATAT'})
I have tested the pipeline before with another dataset, which did not result in the error. Similarly, running the pipeline on the current dataset without ancestry analysis does not result in the error.
Do you have any idea what might be causing this issue?
Command used and terminal output
$ nextflow run pgsc_calc/main.nf -c slurm.config -profile slurm --input /path/to/output_samplesheet.csv --pgs_id PGS000004 --target_build GRCh38 --outdir testOutdir --keep_multiallelic true --keep_ambiguous true --run_ancestry /path/to/pgsc_HGDP+1kGP_v1.tar.zstCommand error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred pgscatalog.calc.cli.aggregate_cli: 2024-11-29 12:26:33 INFO Checking variant overlap pgscatalog.calc.cli.aggregate_cli: 2024-11-29 12:26:33 INFO Read 295 from reference_ALL_additive_0.sscore.vars pgscatalog.calc.cli.aggregate_cli: 2024-11-29 12:26:33 INFO Read 297 from reference_ALL_additive_0.scorefile.gz Traceback (most recent call last): File "/app/pgscatalog.utils/.venv/bin/pgscatalog-aggregate", line 8, in <module> sys.exit(run_aggregate()) ^^^^^^^^^^^^^^^ File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/calc/cli/aggregate_cli.py", line 76, in run_aggregate [verify_variants(x) for x in score_paths] File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/calc/cli/aggregate_cli.py", line 76, in <listcomp> [verify_variants(x) for x in score_paths] ^^^^^^^^^^^^^^^^^^ File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/calc/cli/aggregate_cli.py", line 50, in verify_variants raise ValueError(f"Missing variants {diff}") ValueError: Missing variants frozenset({'5:53383709:C:CAA', '4:125831837:A:AATAT'})Work dir: /hpc/diaggen/users/joris/PRS_data/pgs_calc/work/56/e9595f52fdfeef44e9b06bf84af104Container: /hpc/diaggen/software/singularity_cache/ghcr.io-pgscatalog-pygscatalog-pgscatalog-utils-1.4.4-singularity.img
### Relevant files
[nextflow.log](https://github.com/user-attachments/files/17959011/nextflow.log)
### System information
Nextflow version: 24.10.1
HPC
slurm
singularity
Rocky 8.10
The text was updated successfully, but these errors were encountered:
This error happens when the variants that are output by the pgscatalog-match process don't perfectly match the variants used by plink to calculate the scores. We always want to make sure these two variant sets are perfectly consistent.
It's interesting this happens with the reference panel. I think it has something to do with the matching parameters --keep_multiallelic and --keep_ambiguous (which are both usually false). Does the error still happen if you remove these parameters?
The pipeline does run successfully without --keep_multiallelic and --keep_ambiguous set to true. I enabled these two settings to get a slightly higher match rate, as I was testing the pipeline with a small sampleset :)
Great 🚀 that's helpful, thank you. I'll leave this issue open to investigate properly and fix in our next release - but that probably won't be until early next year sometime 😅
Description of the bug
When running the pipeline with
--run_ancestry
an error occurs during process'PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE (reference)
, I will have the command error listed below, but the error message is inI have tested the pipeline before with another dataset, which did not result in the error. Similarly, running the pipeline on the current dataset without ancestry analysis does not result in the error.
Do you have any idea what might be causing this issue?
Command used and terminal output
The text was updated successfully, but these errors were encountered: