Dynamic data masking #537

mycaule · 2023-12-11T17:40:31Z

mycaule
Dec 11, 2023

Do you have ideas on how to implement dynamic data masking with Athena, macros or Trino functions?

Some users started from how Snowflake implemented it and tried to apply the principles in Redshift.
https://discourse.getdbt.com/t/how-to-implement-dynamic-data-masking-on-redshift/2043

nicor88 · 2023-12-11T18:37:22Z

nicor88
Dec 11, 2023

@mycaule Could you tell me a bit more about the use case that you are trying to solve? why do you require dynamic data masking?

In athena we can leverage lakeformation tagging. This means that data masking is not strictly necessary, because some columns can be simply excluded from user access when using the right lakeformatio tag.

dbt-athena users can override column tags as part of model configs, doing so, for example, some sensitive columns containing PII, can be simply tagged properly, and IAM principals without PII tag access won't simply see the columns that are tagged as PII.

2 replies

mycaule Dec 14, 2023
Author

I would like to allow users to run SELECT phone_number FROM AwsCatalog.datamart.users
so that it returns only truncated phone like 06*****.

Other usecases are names and social security number.

nicor88 Dec 14, 2023

I discovered that select current_user works in Athena, but it just returns the AWS account_id.

One possible way will be to deploy a udf, that return the real use name (or the caller arn), and then create views to do the dynamic data masking using something like:

case
     when my_udf_to_pick_user() in ('user_full_access') then birthday
     else md5(birthday)
   end as masked_birthday,

I don't see other way to "have" the dynamicity, other than using views on top of tables - as suggested by the link that you posted.

jessedobbelaere · 2023-12-14T12:22:38Z

jessedobbelaere
Dec 14, 2023

Another option for masking is to use S3 object lambda (https://aws.amazon.com/blogs/storage/automatically-modify-data-you-are-querying-with-amazon-athena-using-amazon-s3-object-lambda/). With the downside that the masking is not managed from dbt though.

1 reply

nicor88 Dec 14, 2023

Nice 💯
the creation of such lambda MUST happen outside dbt, but via dbt-external-tables dbt-labs/dbt-external-tables#203 can deal with the creation of such external table that call the S3 endpoint.

Also, can the lambda S3 endpoint know who is calling it? Should be possible right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic data masking #537

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Dynamic data masking #537

mycaule Dec 11, 2023

Replies: 2 comments · 3 replies

nicor88 Dec 11, 2023

mycaule Dec 14, 2023 Author

nicor88 Dec 14, 2023

jessedobbelaere Dec 14, 2023

nicor88 Dec 14, 2023

mycaule
Dec 11, 2023

Replies: 2 comments 3 replies

nicor88
Dec 11, 2023

mycaule Dec 14, 2023
Author

jessedobbelaere
Dec 14, 2023