GitHub - lz-chen/pii_anonymizer

PII Anonymizer

This is a dockerized Flask app for anonymizing Personal Identifiable Information (PII) in text, such as person name, phone number, credit card etc. The docker image can be deployed both on premise or to the cloud (This repository contains example scripts for deploying to AWS).

The app utilize the Presidio library for detecting and anonymizing PII. The supported entities can be found here. Currently this app only support PII anonymization for texts in English and Norwegian.

Getting started

Deploy the app locally as Rest API and invoke it

Run start_up.sh from the root directory, with optionally two argument for image name and the port number. For example:

./start_up.sh pii-anonymizer 8989

This script does two things

Build docker image with the provided name, if not provided, the name is by default pii-anonymizer
Serves the docker app, listing to the provided port at the host machine, by default the port number is 8989

Subsequently, you can invoke by running

scripts/predict.sh data/input.json

The API expected text data in JSON format as following:

{"input":
  [
    {"text" : "Hello Paulo Santos. The latest statement for your credit card account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA 98109.", "lang": "en"},
    {"text" : "My phone number is 212-555-5555", "lang": "en"},
    {"text": "Hello this is Jamie Clark calling", "lang": "en"},
  ],
  "mode": "tagged_text"
}

the "lang" field specifies the language of the text, currently supports "en", "no" or "unknown". Specifying the language would save time for the anonymizer, since it does not need to load the language detection module and run the detector. Choices for the "mode" filed includes "tagged_text" for getting result with PII masked with tags such as , . For example:

{
  "output": {
    "output": [
      {
        "tagged_text": "Hello <PERSON>. The latest statement for your credit card account <CREDIT_CARD> was mailed to 123 Any Street, <LOCATION>, WA 98109."
      },
      {
        "tagged_text": "My phone number is <PHONE_NUMBER>"
      },
      {
        "tagged_text": "Hello this is <PERSON> calling"
      }
    ]
  }
}

"detailed_info" for getting detailed result per PII which contains the start index, end index, entity type and entity itself. For example:

{
  "output": {
    "output": [
      {
        "detailed_info": [
          {
            "entity_type": "PERSON",
            "start": 6,
            "end": 18,
            "score": 0.85,
            "entity": "Paulo Santos"
          },
          {
            "entity_type": "LOCATION",
            "start": 120,
            "end": 127,
            "score": 0.85,
            "entity": "Seattle"
          }
        ]
      },
      {
        "detailed_info": [
          {
            "entity_type": "PHONE_NUMBER",
            "start": 19,
            "end": 31,
            "score": 0.75,
            "entity": "212-555-5555"
          }
        ]
      },
      {
        "detailed_info": [
          {
            "entity_type": "PERSON",
            "start": 14,
            "end": 25,
            "score": 0.85,
            "entity": "Jamie Clark"
          }
        ]
      }
    ]
  }
}

Deploy to AWS

Push the image to ECR

Push the image to ECR by running following command from /container directory
```
./build_and_push.sh
```
Create and configure cluster in AWS ECS
Create a task definition
Start a service

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
container		container
data		data
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
start_up.sh		start_up.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PII Anonymizer

Getting started

Deploy the app locally as Rest API and invoke it

Deploy to AWS

Push the image to ECR

Create and configure cluster in AWS ECS

Create a task definition

Start a service

About

Releases

Packages

Languages

lz-chen/pii_anonymizer

Folders and files

Latest commit

History

Repository files navigation

PII Anonymizer

Getting started

Deploy the app locally as Rest API and invoke it

Deploy to AWS

Push the image to ECR

Create and configure cluster in AWS ECS

Create a task definition

Start a service

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages