Skip to content

Defining and exploring privacy computation concepts and technologies.

Notifications You must be signed in to change notification settings

coreystone/privacy-engineering-glossary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

privacy-engineering-glossary

Defining and exploring privacy computation concepts and technologies.

A

Aggregation

A statistical method to combine a collection of raw data and output it as a total or summary of that data. Useful for analytics and research. Offers a modicum of privacy protection, but still susceptible to re-identification attacks.

Algorithmic fairness

The notion that algorithms should make decisions without "bias", ensuring equitable treatment across different demographic groups in a dataset.

Anonymization

The "process by which personal data is altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party" (ISO 25237:2017).

Authorization (AuthZ)

A set of permissions for an authenticated user.

Authenticaton (AuthN)

Ensuring a person (or system) is who they claim they are by verifying their identity.

  • Mechanisms include: Asymmetric/public-key cryptography (certificate exchange), username/password login, biometrics, et. al
  • Related: Authorization (AuthZ)

Availability

B

Background knowledge attack

A type of attack against k-anonymity where an adversary uses external information to infer sensitive data about individuals from anonymized datasets.

C

CIA triad

Confidentiality, Integrity, Availability. An archetype of cybersecurity.

Confidentiality

Data is protected as to prevent any unauthorized access, "whether intentional or accidental".

Consent management platform (CMP)

A data governance tool to obtain, record, map, and manage user consent for data collection and processing in compliance with privacy regulations.

D

Data classification

A hierarchy or class system used to assign risk or sensitivity levels to data, where higher levels indicate higher risk or sensitivity (and potentially prescribe stricter controls). The most popular example is the US government's information classification system, which includes "Top Secret" and "Classified".

Data lineage

A complete history of the lifecycle of a data point, from beginning to end. This includes all transformations, movement, and other operations performed on the data.

Lineage (n): Descent in a line from a common progenitor

Data minimization

The principle of limiting data collection to only what is necessary for a specific purpose, reducing privacy "attack surface".

Data provenance

A high-level overview of "where the data comes from."

Provenance (n): Place of origin; derivation.

Data quality

The relative accuracy, completeness, reliability, and relevance of data which is important for compliance and relevant analytical efforts. It is important to avoid storing "stale" data.

Data tagging

The process of labeling data with metadata to enhance its organization, retrieval, and compliance with privacy regulations.

  • Distinct, machine-readable inputs used to filter data across systems
  • Usually represented as or similar to a regular expression
  • Read more: Data Tagging: What You Need to Know

De-identification

The general process of removing or altering personal information from a dataset to prevent the identification of individuals, while still allowing for useful data analysis.

Delta (δ) presence

A mathematical concept that refers to the ability to determine whether a specific individual is present in a dataset based on the changes in data over time.

  • "This is slightly different than re-identification risk in that the goal is not to find which exact record corresponds which individual, only to know whether an individual is part of the dataset."
  • Read more: About δ-presence

Deterministic algorithm

An algorithm that "always returns the same result given the same input parameters in the same state" of a dataset.

Differential privacy

A privacy-preserving mathematical framework to maximize the accuracy of queries from statistical databases while minimizing the chances of identifying individual entries in the dataset by introducing noise.

Downcoding attack

An attack against "quasi-identifier-based deidentification techniques (QI-deidentification)[,] including k-anonymity, l-diversity, and t-closeness."

F

Federated learning

A decentralized machine learning approach that enables disparate data sources (nodes) to collaboratively train a central model, without having training data ever leave any data source or be sent to the central (federated) server.

H

Homogeneity attack

An attack on a k-anonymous dataset where the attacker exploits "groups that leak information due to lack of diversity in the sensitive attribute." To address this, a sanitized table should be “diverse”, where "all tuples that share the same values of their quasi-identifiers should have diverse values for their sensitive attributes."

Homomorphic encryption

A "method to encrypt data and perform operations on it" without having to decrypt the data. Particular use cases include financial, health, and other environments with highly sensitive data.

I

Insecure direct object reference (IDOR)

A "vulnerability that arises when attackers can access or modify objects by manipulating identifiers used in a web application's URLs or parameters." Poor access controls fail to verify if a user is authorized to access data (such as that of another user).

Integrity

Data is free from corruption, manipulation, or other unknown modifications. Data is "authentic, accurate, and reliable".

K

K-anonymity

"A property of anonymized data" with "scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful."

K-map

A similar approach to k-anonymity, "except that it assumes that the attacker most likely doesn't know who is in the dataset."

L

ℓ-diversity

An attempt to address k-anonymity attacks (such as homogeneity and background knowledge attacks), which "attempts to measure how much an attacker can learn about people in terms of k-anonymity and equivalence classes".

N

Non-deterministic algorithm

An algorithm that "does not necessarily always return the same result given the same input parameters in the same state" of a dataset. (Stack Overflow)

P

Privacy by design

A collection of data privacy principles proposed by Dr. Ann Cavoukian to take a "proactive approach to privacy that emphasises the need to incorporate data protection practices into projects and decisions at the outset, rather than as an afterthought."

Privacy-enhancing technology (PET)

An assortment of tools that that enable businesses to comply with privacy regulations while preserving the individual privacy and utility of their data sets, for purposes such as analytics, development, sharing, etc.

Private set intersection (PSI)

Also called "double encryption". A type of secure multiparty computation "in which each party has a set of items and the goal is to learn the intersection of those sets while revealing nothing else about those sets."

Pseudonymization (tokenization)

A form of de-identification where personal identifiers are replaced with placeholder values (or "tokens"). Unlike anonymization, pseudonymization does not alter the original data, which can still be linked to an individual, and pseudonymization is reversible.

R

Re-identification

An attack to identify individuals in an "anonymized" dataset using external information and computing techniques to link individuals to their "de-identified" personal information.

S

Secure multi-party computation (S/MPC)

Also known as "privacy-preserving computation". [Any] "cryptographic protocol that distributes a computation across multiple parties where no individual party can see the other parties’ data."

Scream test

  • A test to determine if access to a resource (like personal information, a server, etc.) is still necessary by shutting off access to the resource entirely; if nobody screams bloody murder, then they didn't really need that resource; this can be applied to enforce data minimization
  • Read more: Microsoft uses a scream test to silence its unused servers

Software development lifecycle (SDLC)

The process that an organization follows to design, develop, test, deploy, and maintain software. "Shifting privacy left" is the idea to engrain privacy into earlier into product ideation and requirements drafting in order to achieve privacy by design.

Structured data

Static code analysis

A process to scan the source code of a product to identify personal data to understand how and where it is collected and processed throughout systems.

Synthetic data

T

Threat modeling

A process (originally from cybersecurity) to identify and understand threats to a system and their mitigations. In the context of privacy, this relates to threats to personal information in a system and those data subjects.

Trusted execution environment (TEE)

A segregated safe zone within a CPU where only signed code within the environment can be loaded, and all code is "processed in the clear but is only visible in encrypted form when anything outside tries to access it." This ensures that "even if a system is compromised, the data within the TEE remains secure."

U

Unstructured data

Z

Zero knowledge proof

A "cryptographic mechanism that allows anyone to prove the truth of a statement without having to share the information in a statement."

zk-SNARK

  • Zero-Knowledge Succinct Non-Interactive Argument of Knowledge
  • Allows a Prover "[to create] a unique fingerprint for each proof [...] making it impossible to reverse-engineer the original statement. Essentially, these polynomial equations are solvable only by the Prover but verifiable by anyone. They're a puzzle to which only the Prover knows the answer, yet anyone can confirm the answer is correct without knowing what it is."
  • Read more:

About

Defining and exploring privacy computation concepts and technologies.

Resources

Stars

Watchers

Forks