Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Configuration #11

Open
1 task done
darylldoyle opened this issue Jan 24, 2024 · 1 comment · May be fixed by #13
Open
1 task done

JSON Configuration #11

darylldoyle opened this issue Jan 24, 2024 · 1 comment · May be fixed by #13
Assignees
Labels
type:enhancement New feature or request.

Comments

@darylldoyle
Copy link

Is your enhancement related to a problem? Please describe.

PII and other sensitive data can be stored throughout a WordPress database. Whilst this does a good job of focusing on Users and Comments, it requires engineers to hook into the scrubbing process and manually scrub the data on each project.

Allowing engineers to set up a wp-scrubber.json file, outlining the data and fields that need to be scrubbed could be a very easy way to ease the barrier of entry for projects.

Designs

My idea for the structure would look something like this:

  1. Post Types

    • name: Identifies the post type (e.g., post, page, custom post types).
    • fields: Lists the fields within the post type for scrubbing.
    • post_meta: Specifies post_meta fields and actions.
  2. Taxonomies

    • name: Specifies the taxonomy (e.g., category, tag).
    • terms: Defines terms within the taxonomy for scrubbing.
    • term_meta: Details term_meta fields and scrubbing actions.
  3. Options

    • Lists WordPress options (e.g., admin_email, API Keys etc) for scrubbing.
  4. User Data

    • Covers user data fields (e.g., user_email, display_name) for scrubbing.
  5. Custom Tables

    • name: Names of custom database tables.
    • columns: Specifies columns in these tables for scrubbing.
  6. Truncate Tables

    • Lists tables for complete truncation.

Each section of fields (fields, post_meta, columns etc) above would hold an array of object which have the following properties:

  • action: Defines the scrubbing action (faker, replace, remove).
    • faker: This would use https://fakerphp.github.io/ to set mock data.
    • replace: This would replace the data with a set string.
    • remove: This would replace the data with an empty string or delete the row, depending on the context.
  • faker_type: When using faker, specifies the type of fake data (e.g., name, email).
  • value: For replace action, the value to replace the original data.

Put together, this would look something like:

{
    "post_types":
    [
        {
            "name": "post",
            "fields":
            [
                {
                    "name": "post_title",
                    "action": "faker",
                    "faker_type": "sentence"
                }
            ],
            "post_meta":
            [
                {
                    "key": "pii_containing_meta",
                    "action": "replace",
                    "value": "this string doesn't contain PII"
                }
            ]
        }
    ],
    "taxonomies":
    [
        {
            "name": "custom_taxonomy",
            "terms":
            [
                {
                    "name": "term_name",
                    "action": "faker",
                    "faker_type": "word"
                }
            ],
            "term_meta":
            [
                {
                    "key": "pii_containing_meta",
                    "action": "replace",
                    "value": "this string doesn't contain PII"
                }
            ]
        }
    ],
    "options":
    [
        {
            "name": "google_maps_api_key",
            "action": "remove"
        }
    ],
    "user_data":
    [
        {
            "name": "user_name",
            "action": "faker",
            "faker_type": "name"
        },
        {
            "name": "user_email",
            "action": "faker",
            "faker_type": "email"
        }
    ],
    "custom_tables":
    [
        {
            "name": "wp_registration_log",
            "columns":
            [
                {
                    "name": "email",
                    "action": "faker",
                    "faker_type": "email"
                },
                {
                    "name": "IP",
                    "action": "faker",
                    "faker_type": "ipv4"
                }
            ]
        }
    ],
    "truncate_tables":
    [
        "gf_entry",
        "gf_entry_meta",
        "gf_entry_notes",
        "gf_form_view"
    ]
}

Describe alternatives you've considered

I considered other formats, such as YAML, but they're a lot less human-readable than JSON, which is why we went with that approach.

I think this approach gives us the most amount of structure and flexibility, outside of defining out own scrubbing code for each project.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@darylldoyle darylldoyle added the type:enhancement New feature or request. label Jan 24, 2024
@tlovett1
Copy link
Member

Absolutely love this.

@csloisel csloisel self-assigned this Apr 19, 2024
@csloisel csloisel linked a pull request May 13, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:enhancement New feature or request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants