Skip to content

Lab 03: Parsing Nextflow Output

Ryan edited this page Jan 8, 2024 · 4 revisions

Nextflow Output

What happened after running the exercise pipeline? We should have seen output in the shell that looks something like the following:

N E X T F L O W  ~  version 21.04.3
Launching `./pipeline/main.nf` [fabulous_bernard] - revision: 5dfc860211
executor >  local (4)
[3e/138820] process > WRITE_ODD (4) [100%] 4 of 4 ✔

This gives us:

  1. Nextflow version information
  2. Name of the pipeline and a unique identifier for the run
  3. The executor being used. In this case it is being run directly in the current interactive environment so it it local.
  4. The most recent location in the "work" output directory where a given process is occurring. Here we see that WRITE_ODD has finished running all 4 of the elements in the ch_odd queue channel.

We can check to see what's changed in the output by running:

ls -a

...which should yield

.  ..  .nextflow  .nextflow.log  pipeline  report.html  run_01.sh  timeline.html  work

The ".nextflow" directory contains cached information and history about previous runs of this pipeline. The log file ".nextflow.log" has detailed information about the run and is one of a few critical files for debugging. Both "report.html" and "timeline.html" contain the summary information we requested in the run command earlier. Finally, the "work" directory contains all of the intermediate files and output from the pipeline. It will be good to understand what is contained in the work directory, but in general it is not always a fun place to explore.

As the Nextflow pipeline runs, the work directory is populated with 2 character subdirectories with hexadecimal naming (don't worry). Each of those subdirectories contains one or more nested subdirectories with very long strings such as "5d8ebc456a217b5483706f494ff611" (don't worry). Within each of these nested subdirectories is the actual output from our previous WRITE_ODD process.

Let's see the files (you can go to exercise 03_parsing_output for matching output):

ls work/*/*

...which yields

work/28/5d8ebc456a217b5483706f494ff611:
1.txt

work/29/d59f5d5bc2f787e50b8442ee7c75c4:
3.txt

work/41/83b3062b0d4389697af9068e0b4fc2:
7.txt

work/f5/bc61f8bb9648daa4b8847cc5b48527:
5.txt

Based on the naming of your output work directory, pick a nested subdirectory and cd into it.

⚠️ Your directory naming will be different if you're not using exercise 03_parsing_output, just pick one for now.

cd work/41/83b3062b0d4389697af9068e0b4fc2
ls -a

yielding

.  ..  7.txt  .command.begin  .command.err  .command.log  .command.out  .command.run  .command.sh  .command.trace  .exitcode

In addition to the output ".txt" file we created inside the pipeline, there are also many hidden files Nextflow used to run and log this particular step. Notably, ".command.sh" contains the actual commands defined in our script. Although they are empty in this exercise, the ".command.err", ".command.log", and ".command.out" may be critical in debugging failures down the road.