Skip to content

Commit

Permalink
Text edits
Browse files Browse the repository at this point in the history
  • Loading branch information
JaniceManwiller authored Nov 21, 2024
1 parent fa324c1 commit 4673c77
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions docs/source/redact/index.rst
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
Redact
=============

The Textual redact functionality allows you to identify entities in files, and then optionally tokenize/synthesize these entities to create a safe version of your unstructured text. This functionality works on both raw strings and files, including PDF, DOCX, XLSX, and other formats.
The Textual redact functionality allows you to identify entities in files, and then optionally tokenizeor synthesize these entities to create a safe version of your unstructured text. This functionality works on both raw strings and files, including PDF, DOCX, XLSX, and other formats.

Before you can use these functions, read the :doc:`Getting started <../quickstart/getting_started>` guide and create an API key.

Redacting Text
Redacting text
-----------------

You can redact text directly in a variety of formats such as plain text, json, xml, and html. All redaction requests return a response which includes the original text, redacted text, a list of found entities and their locations. Additionally all redact functions allow you to specify which entities are tokenized and which are synthesized.
You can redact text directly in a variety of formats, such as plain text, JSON, XML, and HTML. All redaction requests return a response that includes the original text, redacted text, a list of found entities, and the entity locations. All redact functions also allow you to specify which entities to tokenize and which to synthesize.

The common set of inputs to are redact functions are:
The common set of inputs to redact functions are:

* **generator_default**
The default operation performed on an entity. The options are 'Redact', 'Synthesis', and 'Off'
The default operation to perform on an entity. The options are 'Redact', 'Synthesis', and 'Off'.
* **generator_config**
A dictionary whose keys are entity labels and values are how to redact the entity. The options are 'Redact', 'Synthesis', and 'Off'.
A dictionary where the keys are entity labels and the values are how to redact the entity. The options are 'Redact', 'Synthesis', and 'Off'.

Example: {'NAME_GIVEN': 'Synthesis'}
* **label_allow_lists**
A dictionary whose keys are entity labels and values are lists of regexes. If a piece of text matches a regex it is flagged as that entity type.
A dictionary where the keys are entity labels and the values are lists of regular expressions. If a piece of text matches a regular expression, it is flagged as that entity type.

Example: {'HEALTHCARE_ID': [r'[a-zA-zZ]{3}\\d{6,}']
* **label_block_lists**
A dictionary whose keys are entity labels and values are lists of regexes. If a piece of text matches a regex it is ignored for that entity type.
A dictionary where the keys are entity labels and the values are lists of regular expressions. If a piece of text matches a regular expression, it is ignored for that entity type.

Example: {'NUMERIC_VALUE': [r'\\d{3}']

The JSON and XML redact functions also have additional inputs which you can read about in their respective sections.
The JSON and XML redact functions also have additional inputs, which you can read about in their respective sections.

.. toctree::
:hidden:
Expand All @@ -42,7 +42,7 @@ Textual can also identify entities within files, including PDF, DOCX, XLSX, CSV,

Textual can then recreate these files with entities that are redacted or synthesized.

To generated redacted/synthesized files:
To generated redacted and synthesized files:

.. code-block:: python
Expand Down Expand Up @@ -71,9 +71,9 @@ To learn more about how to generate redacted and synthesized files, go to :doc:`
Working with datasets
---------------------

A dataset is a feature in the Textual UI. It is a collection of files that all share the same redaction/synthesis configuration.
A dataset is a feature in the Textual application. It is a collection of files that all share the same redaction and synthesis configuration.

To help automate workflows, you can work with datasets directly from the SDK. To learn more about how you can use the SDK to work with datasets, go to :doc:`Datasets <datasets>`.
To help automate workflows, you can work with datasets directly from the SDK. To learn more about how to use the SDK to work with datasets, go to :doc:`Datasets <datasets>`.


.. toctree::
Expand Down

0 comments on commit 4673c77

Please sign in to comment.