Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in parsing ${cancer} #44

Open
RyulKim-Inocras opened this issue Nov 24, 2024 · 2 comments
Open

Error in parsing ${cancer} #44

RyulKim-Inocras opened this issue Nov 24, 2024 · 2 comments

Comments

@RyulKim-Inocras
Copy link

RyulKim-Inocras commented Nov 24, 2024

Thank to your help, I succeded to complete the intogen pipeline on my samples.
But I encounterd a small issue that might affect the final output.

The "cohort.tsv" file shows

COHORT CANCER_TYPE PLATFORM MUTATIONS SAMPLES
XXXXX AAAAAA.parsed.tsv.gz
BRCA WGS 847284 1505

But, I think it should be like this

COHORT CANCER_TYPE PLATFORM MUTATIONS SAMPLES
XXXXX BRCA WGS 847284 1505

In other words,
The ${cancer} in intogen.nf shoud be "BRCA", but wronlgy parsed into "AAAAAA.parsed.tsv.gz\nBRCA"
(note line separator between AAAAAA.parsed.tsv.gz and BRCA!!!!)

This lead an error in DriverDiscovery step, because the option --ctype became

--ctype AAAAAA.parsed.tsv.gz <- line seperator here!!
BRCA

and 'BRCA' is not a excutable command.

I tweaked this problem by editing the DriverDiscovery step as follows:

--ctype ${cancer} --> --ctype BRCA

But still, because cohort.tsv is used in DriverSummary step,
This seems affect the final output including "drivers.tsv" as all the 'CANCER_TYPE' and '_SAMPLES_COHORT' columns in the file shows 'AAAAAA.parsed.tsv.gz' and 'NA' respectively.
I thinks they should be

CANCER_TYPE : 'BRCA' instead of 'AAAAAA.parsed.tsv.gz'
%_SAMPLES_COHORT : the number in SAMPLES column divived by total number of samples (1505 in my case)

How can I solve this minor (potentially major..?) issue?
Do you think this behavior affect any other critical steps other than 'drivers.tsv' and 'cohort.tsv.' files in pipeline?

FYI, my yaml file looks like this

  • type: static
    field: DONOR
    value: '{SAMPLE}'
  • type: static
    field: CANCER
    value: BRCA
  • type: static
    field: PLATFORM
    value: WGS

Regards,

@FedericaBrando
Copy link
Member

FedericaBrando commented Nov 25, 2024

Hi @RyulKim-Inocras,

you need to specify the "DATASET" field in the yml:

for example:

- type: static
  field: DATASET
  value: dataset1_{PLATFORM}_{CANCER}_YYYY

@RyulKim-Inocras
Copy link
Author

RyulKim-Inocras commented Nov 26, 2024

Dear Federica,

Yes I have DATASET field in my yaml

Here is my full metadata.yaml (basically sames as the one in the test folder of intogen)
Please find the attached file. (The file extension was change from yaml to txt to upload)

metadata.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants