Defining and exploring privacy computation concepts and technologies.
A statistical method to combine a collection of raw data and output it as a total or summary of that data. Useful for analytics and research. Offers a modicum of privacy protection, but still susceptible to re-identification attacks.
- Read more: What Is Data Aggregation?
- Related: Re-identification
The notion that algorithms should make decisions without "bias", ensuring equitable treatment across different demographic groups in a dataset.
- Read more: What is Algorithm Fairness?
The "process by which personal data is altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party" (ISO 25237:2017).
- Data is erased or overwritten absolutely in an attempt to delink it from an individual.
- Related: De-identification, Pseudonymization
A set of permissions for an authenticated user.
- Mechanisms include: Role-based access control (RBAC), Policy-based access control (PBAC)
- Read more: Authn vs. authz: How are they different?
- Related: Authentication (AuthN)
Ensuring a person (or system) is who they claim they are by verifying their identity.
- Mechanisms include: Asymmetric/public-key cryptography (certificate exchange), username/password login, biometrics, et. al
- Related: Authorization (AuthZ)
- Data is readily served upon request without delay or downtime.
- Read more: What is the CIA Triad and Why is it important?
- Related: Confidentiality, Integrity, CIA triad
A type of attack against k-anonymity where an adversary uses external information to infer sensitive data about individuals from anonymized datasets.
- Read more:
- Related: K-anonymity, Re-identification
Confidentiality, Integrity, Availability. An archetype of cybersecurity.
- Read more: What is the CIA Triad and Why is it important?
- Related: Confidentiality, Integrity, Availability
Data is protected as to prevent any unauthorized access, "whether intentional or accidental".
- Read more: What is the CIA Triad and Why is it important?
- Related: CIA triad, Integrity, Availability
A data governance tool to obtain, record, map, and manage user consent for data collection and processing in compliance with privacy regulations.
A hierarchy or class system used to assign risk or sensitivity levels to data, where higher levels indicate higher risk or sensitivity (and potentially prescribe stricter controls). The most popular example is the US government's information classification system, which includes "Top Secret" and "Classified".
A complete history of the lifecycle of a data point, from beginning to end. This includes all transformations, movement, and other operations performed on the data.
Lineage (n): Descent in a line from a common progenitor
- Read more:
- Related: Data provenance
The principle of limiting data collection to only what is necessary for a specific purpose, reducing privacy "attack surface".
A high-level overview of "where the data comes from."
Provenance (n): Place of origin; derivation.
- Read more:
- Related: Data lineage
The relative accuracy, completeness, reliability, and relevance of data which is important for compliance and relevant analytical efforts. It is important to avoid storing "stale" data.
- Read more: Understanding Data Quality
The process of labeling data with metadata to enhance its organization, retrieval, and compliance with privacy regulations.
- Distinct, machine-readable inputs used to filter data across systems
- Usually represented as or similar to a regular expression
- Read more: Data Tagging: What You Need to Know
The general process of removing or altering personal information from a dataset to prevent the identification of individuals, while still allowing for useful data analysis.
- Read more: De-identification: A Primer
- Related: Anonymization
A mathematical concept that refers to the ability to determine whether a specific individual is present in a dataset based on the changes in data over time.
- "This is slightly different than re-identification risk in that the goal is not to find which exact record corresponds which individual, only to know whether an individual is part of the dataset."
- Read more: About δ-presence
An algorithm that "always returns the same result given the same input parameters in the same state" of a dataset.
- Read more:
- Related: Non-deterministic algorithm
A privacy-preserving mathematical framework to maximize the accuracy of queries from statistical databases while minimizing the chances of identifying individual entries in the dataset by introducing noise.
- Read: A Survey of Differential Privacy Frameworks
- Watch: What is Differential Privacy?
- Related: Privacy-enhancing technology (PET)
An attack against "quasi-identifier-based deidentification techniques (QI-deidentification)[,] including k-anonymity, l-diversity, and t-closeness."
- Read more:
- Watch: USENIX Security '22 - Attacks on Deidentification's Defenses
A decentralized machine learning approach that enables disparate data sources (nodes) to collaboratively train a central model, without having training data ever leave any data source or be sent to the central (federated) server.
- Read more:
- Related: Privacy-enhancing technology (PET)
An attack on a k-anonymous dataset where the attacker exploits "groups that leak information due to lack of diversity in the sensitive attribute." To address this, a sanitized table should be “diverse”, where "all tuples that share the same values of their quasi-identifiers should have diverse values for their sensitive attributes."
- Read more: ℓ-Diversity: Privacy Beyond k-Anonymity
- Related: K-anonymity, Background knowledge attack
A "method to encrypt data and perform operations on it" without having to decrypt the data. Particular use cases include financial, health, and other environments with highly sensitive data.
- There are several types of homomorphic encryption
- A popular example is the asymetric algorithm RSA, which is partially homomorphic
- Read more:
- Related: Privacy-enhancing technology (PET)
A "vulnerability that arises when attackers can access or modify objects by manipulating identifiers used in a web application's URLs or parameters." Poor access controls fail to verify if a user is authorized to access data (such as that of another user).
- Read more: Insecure Direct Object Reference Prevention Cheat Sheet
- Related: CIA triad, Integrity, Availability
Data is free from corruption, manipulation, or other unknown modifications. Data is "authentic, accurate, and reliable".
- Read more: What is the CIA Triad and Why is it important?
- Related: Availability, CIA triad, Confidentiality
"A property of anonymized data" with "scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful."
- Read more:
- Related: Privacy-enhancing technology (PET)
A similar approach to k-anonymity, "except that it assumes that the attacker most likely doesn't know who is in the dataset."
- Read more:
- Related: Privacy-enhancing technology (PET)
An attempt to address k-anonymity attacks (such as homogeneity and background knowledge attacks), which "attempts to measure how much an attacker can learn about people in terms of k-anonymity and equivalence classes".
- Read more:
- Related: Privacy-enhancing technology (PET)
An algorithm that "does not necessarily always return the same result given the same input parameters in the same state" of a dataset. (Stack Overflow)
- Read more: Difference between Deterministic and Non-deterministic Algorithms
- Related: Deterministic algorithm
A collection of data privacy principles proposed by Dr. Ann Cavoukian to take a "proactive approach to privacy that emphasises the need to incorporate data protection practices into projects and decisions at the outset, rather than as an afterthought."
- Read more: Privacy by Design The 7 Foundational Principles
- Related: Software development lifecycle
An assortment of tools that that enable businesses to comply with privacy regulations while preserving the individual privacy and utility of their data sets, for purposes such as analytics, development, sharing, etc.
- Popular PETs: Homomorphic encryption, de-identification (k-anonymity), differential privacy, federated learning, secure multiparty computation, private set intersection, synthetic data, zero knowledge proofs, trusted execution environments
- Read more:
Also called "double encryption". A type of secure multiparty computation "in which each party has a set of items and the goal is to learn the intersection of those sets while revealing nothing else about those sets."
- Read more: A Brief Overview of Private Set Intersection
- Related: Privacy-enhancing technology (PET), Secure multi-party computation
A form of de-identification where personal identifiers are replaced with placeholder values (or "tokens"). Unlike anonymization, pseudonymization does not alter the original data, which can still be linked to an individual, and pseudonymization is reversible.
- Read more:
- Related: Privacy-enhancing technology (PET)
An attack to identify individuals in an "anonymized" dataset using external information and computing techniques to link individuals to their "de-identified" personal information.
- Read more:
- Related: Aggregation
Also known as "privacy-preserving computation". [Any] "cryptographic protocol that distributes a computation across multiple parties where no individual party can see the other parties’ data."
- Read more: What is Secure Multiparty Computation
- Related: Privacy-enhancing technology (PET), Private set intersection (PSI)
- A test to determine if access to a resource (like personal information, a server, etc.) is still necessary by shutting off access to the resource entirely; if nobody screams bloody murder, then they didn't really need that resource; this can be applied to enforce data minimization
- Read more: Microsoft uses a scream test to silence its unused servers
The process that an organization follows to design, develop, test, deploy, and maintain software. "Shifting privacy left" is the idea to engrain privacy into earlier into product ideation and requirements drafting in order to achieve privacy by design.
- Read more: Integrating Privacy Practices in Software Development Lifecycle
- Related: Privacy by design
- Organized and machine-readable data that can be queried programatically, such as a relational (SQL) database
- Read more: Structured vs. Unstructured Data: What’s the Difference?
- Related: Unstructured data
A process to scan the source code of a product to identify personal data to understand how and where it is collected and processed throughout systems.
- Read more:
- Data, generated by artificial intelligence, that mathematically imitates real-world personal data, which enables orgs to use "privacy-compliant, production-like, and long-retention" data for analytics
- Read more: What is Synthetic Data?
- Watch: PEPR '24 - Compute Engine Testing with Synthetic Data Generation
A process (originally from cybersecurity) to identify and understand threats to a system and their mitigations. In the context of privacy, this relates to threats to personal information in a system and those data subjects.
- Read more: What is privacy threat modeling?
- Watch:
A segregated safe zone within a CPU where only signed code within the environment can be loaded, and all code is "processed in the clear but is only visible in encrypted form when anything outside tries to access it." This ensures that "even if a system is compromised, the data within the TEE remains secure."
- Read more: Basics of Trusted Execution Environments (TEEs): The Heart of Confidential Computing
- Watch: ISCA'23 - Lightning Talks - Session1C - TEESec: Pre-Silicon Vulnerability Discovery for Trusted Exec
- Unorganized data that has no pre-defined structure or pattern that makes it difficult(but not impossible) to reliably process and analyze programatically. "More than 80%" of data on the internet is unstructured.
- Read more: Structured vs. Unstructured Data: What’s the Difference?
- Related: Structured data
A "cryptographic mechanism that allows anyone to prove the truth of a statement without having to share the information in a statement."
- Read more:
- Zero-Knowledge Succinct Non-Interactive Argument of Knowledge
- Allows a Prover "[to create] a unique fingerprint for each proof [...] making it impossible to reverse-engineer the original statement. Essentially, these polynomial equations are solvable only by the Prover but verifiable by anyone. They're a puzzle to which only the Prover knows the answer, yet anyone can confirm the answer is correct without knowing what it is."
- Read more: