Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal-on-data-error mlr -x option #1373

Merged
merged 22 commits into from
Aug 30, 2023
6 changes: 6 additions & 0 deletions docs/src/data-error.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
x
1
2
3
text
4
3 changes: 3 additions & 0 deletions docs/src/manpage.md
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,9 @@ MILLER(1) MILLER(1)
-s {file name} Take command-line flags from file name. For more
information please see
https://miller.readthedocs.io/en/latest/scripting/.
-x If any record has an error value in it, report it and
stop the process. The default is to print the field
value as `(error)` and continue.

1mOUTPUT-COLORIZATION FLAGS0m
Miller uses colors to highlight outputs. You can specify color preferences.
Expand Down
3 changes: 3 additions & 0 deletions docs/src/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -582,6 +582,9 @@ MILLER(1) MILLER(1)
-s {file name} Take command-line flags from file name. For more
information please see
https://miller.readthedocs.io/en/latest/scripting/.
-x If any record has an error value in it, report it and
stop the process. The default is to print the field
value as `(error)` and continue.

1mOUTPUT-COLORIZATION FLAGS0m
Miller uses colors to highlight outputs. You can specify color preferences.
Expand Down
2 changes: 2 additions & 0 deletions docs/src/record-heterogeneity.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ If you `mlr --csv cat` this, you'll get an error message:
<b>mlr --csv cat data/het/ragged.csv</b>
</pre>
<pre class="pre-non-highlight-in-pair">
a,b,c
1,2,3
mlr: mlr: CSV header/data length mismatch 3 != 2 at filename data/het/ragged.csv row 3.
.
</pre>
Expand Down
51 changes: 50 additions & 1 deletion docs/src/reference-dsl-errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,55 @@ Quick links:
</div>
# DSL errors and transparency

# Handling for data errors

By default, Miller doesn't stop data processing for a single cell error. For example:

<pre class="pre-highlight-in-pair">
<b>mlr --csv --from data-error.csv cat</b>
</pre>
<pre class="pre-non-highlight-in-pair">
x
1
2
3
text
4
</pre>

<pre class="pre-highlight-in-pair">
<b>mlr --csv --from data-error.csv put '$y = log10($x)'</b>
</pre>
<pre class="pre-non-highlight-in-pair">
x,y
1,0
2,0.3010299956639812
3,0.4771212547196624
text,(error)
4,0.6020599913279624
</pre>

If you do want to stop processing, though, you have three options. The first is the `mlr -x` flag:

<pre class="pre-highlight-in-pair">
<b>mlr -x --csv --from data-error.csv put '$y = log10($x)'</b>
</pre>
<pre class="pre-non-highlight-in-pair">
x,y
1,0
2,0.3010299956639812
3,0.4771212547196624
mlr: data error at NR=4 FNR=4 FILENAME=data-error.csv
mlr: field y: log10: unacceptable type string with value "text"
mlr: exiting due to data error.
</pre>

The second is to put `-x` into your [`~/.mlrrc` file](customization.md).

The third is to set the `MLR_FAIL_ON_DATA_ERROR` environment variable, which makes `-x` implicit.

# Common causes of syntax errors

As soon as you have a [programming language](miller-programming-language.md), you start having the problem *What is my code doing, and why?* This includes getting syntax errors -- which are always annoying -- as well as the even more annoying problem of a program which parses without syntax error but doesn't do what you expect.

The syntax-error message gives you line/column position for the syntax that couldn't be parsed. The cause may be clear from that information, or perhaps not. Here are some common causes of syntax errors:
Expand All @@ -26,7 +75,7 @@ The syntax-error message gives you line/column position for the syntax that coul

* Curly braces are required for the bodies of `if`/`while`/`for` blocks, even when the body is a single statement.

As for transparency:
# Transparency

* As in any language, you can do `print`, or `eprint` to print to stderr. See [Print statements](reference-dsl-output-statements.md#print-statements); see also [Dump statements](reference-dsl-output-statements.md#dump-statements) and [Emit statements](reference-dsl-output-statements.md#emit-statements).

Expand Down
26 changes: 25 additions & 1 deletion docs/src/reference-dsl-errors.md.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
# DSL errors and transparency

# Handling for data errors

By default, Miller doesn't stop data processing for a single cell error. For example:

GENMD-RUN-COMMAND
mlr --csv --from data-error.csv cat
GENMD-EOF

GENMD-RUN-COMMAND
mlr --csv --from data-error.csv put '$y = log10($x)'
GENMD-EOF

If you do want to stop processing, though, you have three options. The first is the `mlr -x` flag:

GENMD-RUN-COMMAND-TOLERATING-ERROR
mlr -x --csv --from data-error.csv put '$y = log10($x)'
GENMD-EOF

The second is to put `-x` into your [`~/.mlrrc` file](customization.md).

The third is to set the `MLR_FAIL_ON_DATA_ERROR` environment variable, which makes `-x` implicit.

# Common causes of syntax errors

As soon as you have a [programming language](miller-programming-language.md), you start having the problem *What is my code doing, and why?* This includes getting syntax errors -- which are always annoying -- as well as the even more annoying problem of a program which parses without syntax error but doesn't do what you expect.

The syntax-error message gives you line/column position for the syntax that couldn't be parsed. The cause may be clear from that information, or perhaps not. Here are some common causes of syntax errors:
Expand All @@ -10,7 +34,7 @@ The syntax-error message gives you line/column position for the syntax that coul

* Curly braces are required for the bodies of `if`/`while`/`for` blocks, even when the body is a single statement.

As for transparency:
# Transparency

* As in any language, you can do `print`, or `eprint` to print to stderr. See [Print statements](reference-dsl-output-statements.md#print-statements); see also [Dump statements](reference-dsl-output-statements.md#dump-statements) and [Emit statements](reference-dsl-output-statements.md#emit-statements).

Expand Down
1 change: 1 addition & 0 deletions docs/src/reference-main-flag-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,7 @@ These are flags which don't fit into any other category.
* `-I`: Process files in-place. For each file name on the command line, output is written to a temp file in the same directory, which is then renamed over the original. Each file is processed in isolation: if the output format is CSV, CSV headers will be present in each output file, statistics are only over each file's own records; and so on.
* `-n`: Process no input files, nor standard input either. Useful for `mlr put` with `begin`/`end` statements only. (Same as `--from /dev/null`.) Also useful in `mlr -n put -v '...'` for analyzing abstract syntax trees (if that's your thing).
* `-s {file name}`: Take command-line flags from file name. For more information please see https://miller.readthedocs.io/en/latest/scripting/.
* `-x`: If any record has an error value in it, report it and stop the process. The default is to print the field value as `(error)` and continue.

## Output-colorization flags

Expand Down
Loading
Loading