You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently validation mode reports validation errors using csv2rdf.logging/log-warning, which prints the errors to stdout either using clojure.tools.logging (the default) or println (on graalvm) depending on the logger that has been configured. This works fine for working with this tool via the command line, but it may be useful to report the errors in a more structured way, such that they could be ingested into somewhere (e.g. airflow) such that they can be interrogated more thoroughly. The proposal is as follows:
Allow an extra command line argument (--error-formatter) which specifies how the errors should be formatted. It can take the following values:
"default" -> the current behaviour of printing to stdout, this could also be named "stdout"
"csv" -> the errors are written to a csv file
in the latter case, we can re-use the existing (--output-file) argument to optionally specify the local file name of where to write the errors to.
The csv file would have headers:
file (the name of the data file / where the error is located)
row (the row corresponding to the invalid cell)
col (the column corresponding to the invalid cell)
message (some text to describe the error in detail, which could include and metadata required to contextualise the error, although this context/metadata could arguably be placed in a separate column)
error_type (either equal to cell or schema to indicate the type of error)
Further to this, it may be worth generating a schema file to clarify the contents of the errors, but this is up for discussion.
The text was updated successfully, but these errors were encountered:
We could also support another column of metadata about errors that would indicate something more like error_level or error_severity - which might (at some point in the future be extended to) range from hard errors like we have now to more “linted” best practice violations/warnings etc…
Currently validation mode reports validation errors using
csv2rdf.logging/log-warning
, which prints the errors to stdout either usingclojure.tools.logging
(the default) orprintln
(on graalvm) depending on the logger that has been configured. This works fine for working with this tool via the command line, but it may be useful to report the errors in a more structured way, such that they could be ingested into somewhere (e.g. airflow) such that they can be interrogated more thoroughly. The proposal is as follows:Allow an extra command line argument (--error-formatter) which specifies how the errors should be formatted. It can take the following values:
in the latter case, we can re-use the existing (--output-file) argument to optionally specify the local file name of where to write the errors to.
The csv file would have headers:
cell
orschema
to indicate the type of error)Further to this, it may be worth generating a schema file to clarify the contents of the errors, but this is up for discussion.
The text was updated successfully, but these errors were encountered: