Skip to content

Commit

Permalink
Fatal-on-data-error mlr -x option (#1373)
Browse files Browse the repository at this point in the history
* Fatal-on-data-error `mlr -x` option [WIP]

* arithmetic.go error-reason propagation

* more

* more

* more

* renames

* doc page

* namefix

* fix broken test

* make dev
  • Loading branch information
johnkerl authored Aug 30, 2023
1 parent 879f272 commit 0493a0d
Show file tree
Hide file tree
Showing 63 changed files with 1,649 additions and 853 deletions.
6 changes: 6 additions & 0 deletions docs/src/data-error.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
x
1
2
3
text
4
3 changes: 3 additions & 0 deletions docs/src/manpage.md
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,9 @@ MILLER(1) MILLER(1)
-s {file name} Take command-line flags from file name. For more
information please see
https://miller.readthedocs.io/en/latest/scripting/.
-x If any record has an error value in it, report it and
stop the process. The default is to print the field
value as `(error)` and continue.

1mOUTPUT-COLORIZATION FLAGS0m
Miller uses colors to highlight outputs. You can specify color preferences.
Expand Down
3 changes: 3 additions & 0 deletions docs/src/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -582,6 +582,9 @@ MILLER(1) MILLER(1)
-s {file name} Take command-line flags from file name. For more
information please see
https://miller.readthedocs.io/en/latest/scripting/.
-x If any record has an error value in it, report it and
stop the process. The default is to print the field
value as `(error)` and continue.

1mOUTPUT-COLORIZATION FLAGS0m
Miller uses colors to highlight outputs. You can specify color preferences.
Expand Down
2 changes: 2 additions & 0 deletions docs/src/record-heterogeneity.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ If you `mlr --csv cat` this, you'll get an error message:
<b>mlr --csv cat data/het/ragged.csv</b>
</pre>
<pre class="pre-non-highlight-in-pair">
a,b,c
1,2,3
mlr: mlr: CSV header/data length mismatch 3 != 2 at filename data/het/ragged.csv row 3.
.
</pre>
Expand Down
51 changes: 50 additions & 1 deletion docs/src/reference-dsl-errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,55 @@ Quick links:
</div>
# DSL errors and transparency

# Handling for data errors

By default, Miller doesn't stop data processing for a single cell error. For example:

<pre class="pre-highlight-in-pair">
<b>mlr --csv --from data-error.csv cat</b>
</pre>
<pre class="pre-non-highlight-in-pair">
x
1
2
3
text
4
</pre>

<pre class="pre-highlight-in-pair">
<b>mlr --csv --from data-error.csv put '$y = log10($x)'</b>
</pre>
<pre class="pre-non-highlight-in-pair">
x,y
1,0
2,0.3010299956639812
3,0.4771212547196624
text,(error)
4,0.6020599913279624
</pre>

If you do want to stop processing, though, you have three options. The first is the `mlr -x` flag:

<pre class="pre-highlight-in-pair">
<b>mlr -x --csv --from data-error.csv put '$y = log10($x)'</b>
</pre>
<pre class="pre-non-highlight-in-pair">
x,y
1,0
2,0.3010299956639812
3,0.4771212547196624
mlr: data error at NR=4 FNR=4 FILENAME=data-error.csv
mlr: field y: log10: unacceptable type string with value "text"
mlr: exiting due to data error.
</pre>

The second is to put `-x` into your [`~/.mlrrc` file](customization.md).

The third is to set the `MLR_FAIL_ON_DATA_ERROR` environment variable, which makes `-x` implicit.

# Common causes of syntax errors

As soon as you have a [programming language](miller-programming-language.md), you start having the problem *What is my code doing, and why?* This includes getting syntax errors -- which are always annoying -- as well as the even more annoying problem of a program which parses without syntax error but doesn't do what you expect.

The syntax-error message gives you line/column position for the syntax that couldn't be parsed. The cause may be clear from that information, or perhaps not. Here are some common causes of syntax errors:
Expand All @@ -26,7 +75,7 @@ The syntax-error message gives you line/column position for the syntax that coul

* Curly braces are required for the bodies of `if`/`while`/`for` blocks, even when the body is a single statement.

As for transparency:
# Transparency

* As in any language, you can do `print`, or `eprint` to print to stderr. See [Print statements](reference-dsl-output-statements.md#print-statements); see also [Dump statements](reference-dsl-output-statements.md#dump-statements) and [Emit statements](reference-dsl-output-statements.md#emit-statements).

Expand Down
26 changes: 25 additions & 1 deletion docs/src/reference-dsl-errors.md.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
# DSL errors and transparency

# Handling for data errors

By default, Miller doesn't stop data processing for a single cell error. For example:

GENMD-RUN-COMMAND
mlr --csv --from data-error.csv cat
GENMD-EOF

GENMD-RUN-COMMAND
mlr --csv --from data-error.csv put '$y = log10($x)'
GENMD-EOF

If you do want to stop processing, though, you have three options. The first is the `mlr -x` flag:

GENMD-RUN-COMMAND-TOLERATING-ERROR
mlr -x --csv --from data-error.csv put '$y = log10($x)'
GENMD-EOF

The second is to put `-x` into your [`~/.mlrrc` file](customization.md).

The third is to set the `MLR_FAIL_ON_DATA_ERROR` environment variable, which makes `-x` implicit.

# Common causes of syntax errors

As soon as you have a [programming language](miller-programming-language.md), you start having the problem *What is my code doing, and why?* This includes getting syntax errors -- which are always annoying -- as well as the even more annoying problem of a program which parses without syntax error but doesn't do what you expect.

The syntax-error message gives you line/column position for the syntax that couldn't be parsed. The cause may be clear from that information, or perhaps not. Here are some common causes of syntax errors:
Expand All @@ -10,7 +34,7 @@ The syntax-error message gives you line/column position for the syntax that coul

* Curly braces are required for the bodies of `if`/`while`/`for` blocks, even when the body is a single statement.

As for transparency:
# Transparency

* As in any language, you can do `print`, or `eprint` to print to stderr. See [Print statements](reference-dsl-output-statements.md#print-statements); see also [Dump statements](reference-dsl-output-statements.md#dump-statements) and [Emit statements](reference-dsl-output-statements.md#emit-statements).

Expand Down
1 change: 1 addition & 0 deletions docs/src/reference-main-flag-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,7 @@ These are flags which don't fit into any other category.
* `-I`: Process files in-place. For each file name on the command line, output is written to a temp file in the same directory, which is then renamed over the original. Each file is processed in isolation: if the output format is CSV, CSV headers will be present in each output file, statistics are only over each file's own records; and so on.
* `-n`: Process no input files, nor standard input either. Useful for `mlr put` with `begin`/`end` statements only. (Same as `--from /dev/null`.) Also useful in `mlr -n put -v '...'` for analyzing abstract syntax trees (if that's your thing).
* `-s {file name}`: Take command-line flags from file name. For more information please see https://miller.readthedocs.io/en/latest/scripting/.
* `-x`: If any record has an error value in it, report it and stop the process. The default is to print the field value as `(error)` and continue.

## Output-colorization flags

Expand Down
Loading

0 comments on commit 0493a0d

Please sign in to comment.