Skip to content

Commit

Permalink
Refer to the "Reviewed" column as the Validation column
Browse files Browse the repository at this point in the history
  • Loading branch information
louisdorard committed Nov 18, 2024
1 parent f24f080 commit 98ef06e
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 19 deletions.
2 changes: 1 addition & 1 deletion docs/editschema.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ For each column of the original dataset you can set the following fields:
```json
[
{
"name": "reviewed",
"name": "validated",
"type": "boolean_tick"
}
]
Expand Down
20 changes: 10 additions & 10 deletions docs/validate.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,36 +2,36 @@

## Use case description

We want business users (aka end-users) to validate/review machine-generated data and make corrections as needed, based on their domain expertise. From a business perspective, there are two sub use cases: we may use the human-reviewed, machine-generated data...
We want business users (aka end-users) to validate machine-generated data and make corrections as needed, based on their domain expertise. From a business perspective, there are two sub use cases: we may use the human-reviewed, machine-generated data...

* for mass corrections or enrichment of source data;
* as input to an operational process.

Machine-generated data would be stored in the output dataset of an existing data pipeline. Each row would correspond to an item to review. Columns would include:
Machine-generated data would be stored in the output dataset of an existing data pipeline. Each row would correspond to an item to validate. Columns would include:

* primary keys;
* machine-generated columns, whose values would change if the pipeline or its algorithms change;
* display-only columns, whose values would help the end-user figure out how to review/edit/provide feedback.
* display-only columns, whose values would help the end-user figure out how to validate/edit/provide feedback.

Instead of exporting this dataset to Excel, we want end-users to access a web interface to review and correct the data. In addition to the above columns, we would want 2 feedback columns: one to mark rows as “Reviewed” (via checkboxes) and one to write comments.
Instead of exporting this dataset to Excel, we want end-users to access a web interface to validate and correct the data. In addition to the above columns, we would want 2 feedback columns: one to mark rows as valid (via checkboxes) and one to write comments.

## Special behavior of the validation column ("Reviewed")
## Special behavior of the validation column

The webapp’s backend implements special behavior when a cell from a column named Reviewed is edited: values of all editable columns from the same row are logged (even if they weren’t edited).
The webapp’s backend implements special behavior when a cell from a column named "Validated" or "Reviewed" is edited: values of all editable columns from the same row are logged (even if they weren’t edited).

This allows the _editlog_ to include not just the information that the row was reviewed, but also the actual values that were reviewed. This is particularly useful when those values were generated by an algorithm, because they may change if the algorithm changes.
This allows the _editlog_ to include not just the information that the row is valid, but also the actual values that were validated. This is particularly useful when those values were generated by an algorithm, because they may change if the algorithm changes.

As a result, there will be no missing value in the machine-generated and human-reviewed columns that are present in the _edits_ dataset, for rows marked as Reviewed.
As a result, there will be no missing value in the machine-generated and human-reviewed columns that are present in the _edits_ dataset, for rows marked as valid.

## How-to

You must be familiar with the initial [How to Use guide](https://www.dataiku.com/product/plugins/visual-edit/#how-to-use) before following the steps below.

* **Add feedback columns to the dataset to review**: this can be done via code in the existing data pipeline, or with an additional Prepare recipe, as columns with missing values to serve as placeholders in the webapp.
* **When creating a Visual Edit webapp**: make sure to select all machine-generated columns and feedback columns as editable.
* **When using the webapp**: you would review values in generated columns (mark as reviewed, or edit values and add notes when necessary) and fill in missing values.
* **When using the webapp**: you would review values in generated columns (mark as valid, or edit values and add notes when necessary) and fill in missing values.
* **When building the Flow and defining the _update source_ scenario**: you would typically want to notify end-users via email if there is new data to review.
* **Test with IT**: share the _edits_ dataset with IT for them to propagate or leverage edits in other IT systems; columns of this dataset include primary keys, machine-generated and human-reviewed columns, a boolean Reviewed column, and additional human feedback columns.
* **Test with IT**: share the _edits_ dataset with IT for them to propagate or leverage edits in other IT systems; columns of this dataset include primary keys, machine-generated and human-reviewed columns, a boolean validation column, and additional human feedback columns.

## Next

Expand Down
14 changes: 8 additions & 6 deletions dss-plugin-visual-edit/python-lib/DataEditor.py
Original file line number Diff line number Diff line change
Expand Up @@ -490,19 +490,21 @@ def update_row(
"""
key = get_key_values_from_dict(primary_keys, self.primary_keys)

def is_reviewed_column(column_name: str):
return column_name == "Reviewed" or column_name == "reviewed"
def is_validation_column(column_name: str):
return (
column_name.lower() == "reviewed" or column_name.lower() == "validated"
)

def is_comments_column(column_name: str):
return column_name == "Comments" or column_name == "comments"

# for reviewed column, create an editlog for each columns to enforce values even after a change in the original.
# Append the reviewed value change last in case something goes wrong during updates of the previous column values.
# When updating the validation column, we first create a log entry for each editable column, to enforce values even after a change in the original.
# We then log the new value of the validation column.
# To improve this, the best would be to do all the inserts in the same transaction.
if is_reviewed_column(column):
if is_validation_column(column):
results = []
for col in self.editable_column_names:
if not is_comments_column(col) and not is_reviewed_column(col):
if not is_comments_column(col) and not is_validation_column(col):
# contains values for primary keys — and other columns too, but they'll be discarded
results.append(self.__log_edit__(key, col, primary_keys[col]))
results.append(self.__log_edit__(key, column, primary_keys[column]))
Expand Down
2 changes: 1 addition & 1 deletion dss-plugin-visual-edit/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ We test the following:

- Editing of columns of type int, float, and string.
- Editing datasets with a single primary key and with several primary key columns.
- Editing of a 'Reviewed' column (which should have a [special behavior](https://dataiku.github.io/dss-visual-edit/validate#special-behavior-of-the-validation-column-reviewed))
- Editing of a validation column (which should have a [special behavior](https://dataiku.github.io/dss-visual-edit/validate#special-behavior-of-the-validation-column))
- Impact of the `labels` and `lookup_columns` configuration variables of Linked Records
- Impact of the `authorized_users` configuration variable on the ability to make edits.

Expand Down
2 changes: 1 addition & 1 deletion dss-plugin-visual-edit/webapps/visual-edit/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ def add_edit(cell):
"""
Record edit in editlog, once a cell has been edited
If the cell is in the Reviewed column, we also update values for all other editable columns in the same row (except Comments). The values in these columns are generated by the upstream data flow and subject to change. We record them, in case the user didn't edit them before marking the row as reviewed.
If the cell is in a validation column, we also update values for all other editable columns in the same row (except Comments). The values in these columns are generated by the upstream data flow and subject to change. We record them, in case the user didn't edit them before marking the row as valid.
"""

row_dic = cell["row"]
Expand Down

0 comments on commit 98ef06e

Please sign in to comment.