If you do not have an installed environment or you don't have nextflow itself, here is one of the ways to install it.
Define NXF_HOME
env variable to use a nextflow home location instead of the default one ($HOME/.nextflow
).
Everything else is unchanged from the default Nextflow installation instructions on https://www.nextflow.io/index.html#GetStarted.
# add NXF_HOME env
export NXF_HOME=$(pwd)/dot.nextflow # or whatever
# get nextflow and install almost like here: https://www.nextflow.io/index.html#GetStarted
wget -O - https://get.nextflow.io > nextflow.install.bash
# review and run
cat nextflow.install.bash | bash -i 2>&1 | tee nextflow.install.log
# run test, see https://www.nextflow.io/index.html#GetStarted
./nextflow run hello
Configure the environment you are using if you have not done so yet.
Don't forget to add NXF_HOME
, patch PATH
and export them.
# fix env variables, i.e.:
export NXF_HOME=$(pwd)/dot.nextflow
export PATH=$(pwd):$PATH
If you wish, you can set NXF_WORK
env to be used by nextflow
.
export NXF_WORK=...
Or use nextflow -e.NXF_WORK=...
approach.
Ideally, should be overridable by the -work-dir
(-w
) option of nextflow run
Once you have production (and nextflow) env ready, you can run pipelines. I.e.
CMD=<dba_alias>
mkdir -p data
pushd data
data_dir=$(pwd)
nextflow run \
-w ${data_dir}/nextflow_work \
${ENSEMBL_ROOT_DIR}/ensembl-genomio/pipelines/nextflow/workflows/dumper_pipeline/main.nf \
-profile lsf \
$(${CMD} details script) \
--dbname_re '^drosophila_melanogaster_\w+_57_.*$' \
--output_dir ${data_dir}/dumper_output
popd
Try to invoke pipelines with --help
option to get insight on how to run them.
When running a stage or a subworkflow on a channel with a single element we expect stream to be forked, allowing us to seed several task at a time.
// create that channel with a single element
// calls read_json(...) in turn, see below
dbs = from_read_json(...)
DUMP_SQL(..., dbs, ...)
DUMP_METADATA(..., dbs, ...)
Instead pipeline dies with
Caused by: Cannot load from object array because "this.keys" is null
and when printing this object (dbs
in this case, with println "db: ${db}"
), we see it dict surrounded by the curly brackets like this
{..., "db_name":"some_db_name", ...}
instead of this (with square brackets)
[..., "db_name":"some_db_name", ...]
In our case we used the read_json
function similar to this one:
def read_json(json_path) {
slurp = new JsonSlurper()
json_file = file(json_path)
text = json_file.text
return slurp.parseText(text) // <-- problem here
}
that returned some kind of a lazy evaluator/iterator/whatever(not sure).
Replacing return slurp.parseText(text)
with
not_a_lazy_val = slurp.parseText(text)
return not_a_lazy_val
did help.