Skip to content
This repository has been archived by the owner on Mar 23, 2023. It is now read-only.

Missing input files for rule all: #29

Open
GeoMicroSoares opened this issue Jan 13, 2022 · 5 comments
Open

Missing input files for rule all: #29

GeoMicroSoares opened this issue Jan 13, 2022 · 5 comments

Comments

@GeoMicroSoares
Copy link

GeoMicroSoares commented Jan 13, 2022

Hi there @eharr , @Priyesh000 , @LynnLy ,

I'm getting the following after running snakemake --use-conda -j 10 --dry-run on my own data after my test ran successfully:

Building DAG of jobs...
MissingInputException in line 42 of /mnt/drive/Pore-C-Snakemake/Snakefile:
Missing input files for rule all:
results/basecall/DpnII_run_1.rd.catalog.yaml
results/basecall/DpnII_run_2.rd.catalog.yaml

I really can't figure out how this catalog.yaml file is created, so I'm not sure how to fix this. DpnII is listed by Biopython, so this shouldn't be an issue... Below are my config files - hopefully someone can spot something going wrong?

  • config/basecalls.tsv:
run_id enzyme refgenome_ids biospecimen fastq_path fast5_directory      sequencing_summary_path
run_1 DpnII Bs_p29 NA ../prelim_check.d/run_1712021.fastq     ../raw.d/run_1712021/all_fast5/       ../raw.d/run_1712021/sequencing_summary.txt
run_2 DpnII draft1 NA ../prelim_check.d/run_1712021.fastq     ../raw.d/run_1712021/all_fast5/       ../raw.d/run_1712021/sequencing_summary.txt
  • config/references.tsv:
refgenome_id refgenome_path
Bs_p29 ../prelim_check.d/reference.fasta
draft1 ../prelim_check.d/reference.fasta

I didn't change config/config.yaml or file_layout.yaml, just deleted phased_vcfs.tsv as I'm not using that.

Thanks in advance.

@LynnLy
Copy link
Contributor

LynnLy commented Jan 13, 2022

Hi @GeoMicroSoares,

The pipeline is expecting the run_id and enzyme to not contain any underscores, since those are being used to delimit the wildcards in the output file names. Can you change the run_ids from run_1 and run_2 to run1 and run2?

@stasys-hub
Copy link

@LynnLy Could you specify how and where the run_id should be changed? Just in the folder structure?

@LynnLy
Copy link
Contributor

LynnLy commented Jul 19, 2022

Hi @stasys-hub,

You shouldn't need to change the name of any existing files. The run_id that you specify in the first column of config/basecalls.tsv determines the names of the output files, and must not contain any underscores. Pore-C-Snakemake will read the file specified in the "fastq_path" column (which can be named anything) to create smaller fastq files with the naming structure: basecall/{enzyme}_{run_id}.rd.{batch_id}.fq.gz.

@stasys-hub
Copy link

Thank you very much, @LynnLy ! Gonna try that today.

@stasys-hub
Copy link

@LynnLy, that solved my problems! Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants