diff --git a/vpdq/README.md b/vpdq/README.md
index 565cf5b86..b83b5720c 100644
--- a/vpdq/README.md
+++ b/vpdq/README.md
@@ -8,18 +8,57 @@ See [CPP implementation](#cpp-implementation) for how to install and use vpdq.
## Compared to TMK+PDQF
-Compared to TMK+PDQF (TMK), which also relies on the PDQ image hashing algorithm:
-TMK optimizes for identical videos (same length), vPDQ can match subsequences or clips within videos.
-TMK has a fixed-length hash, which simplifies matching lookup, and can be near constant time with the help of FAISS. vPDQ produces a variable length hash, and requires a linear comparison of candidates. This requires either an O(n*Fc*Fq) lookup where n is the number of videos being compared, and Fc is the average number of frames per compared video and Fq is the number of frames in the source video, or an initial filtering pass to reduce the candidates, which can potentially discard matching videos.
-Both TMK and vPDQ are backed by PDQ, and so inherit both PDQ’s strengths and weaknesses.
+Compared to TMK+PDQF (TMK):
+
+| Feature | vPDQ | TMK+PDQF |
+|-----------------------------------------------------|:--------:|:--------:|
+| Uses PDQ for hashing frames | ✅ | ✅ |
+| Optimized for identical videos (same length) | ❌ | ✅ |
+| Supports subsequence/clip matching | ✅ | ❌ |
+| Fixed-length hash (near O(1) time matches search) | ❌ | ✅ |
+
+Both TMK and vPDQ are backed by PDQ for hashing video frames. Therefore, they both inherit PDQ's strengths and weaknesses.
+
+TMK optimizes for identical videos (same length). vPDQ can match subsequences or clips within videos.
+
+TMK has a fixed-length hash, which simplifies matching lookup, and can be near constant time with the help of FAISS. vPDQ produces a variable length hash, and requires a linear comparison of candidates.
+* vPDQ linear search complexity is O(n*Fc*Fq), where n is the number of videos being compared, and Fc is the average number of frames per compared video and Fq is the number of frames in the source video.
+ * To speed this up, an initial filtering pass can be used to reduce the candidates. The downside is it could discard matching videos, causing false-negatives.
## Description of Algorithm
-### Producing a Hash
+### Producing a Video Hash
+
+The algorithm for producing the video hash is:
+
+1. Given a video, convert it into a sequence of frame images.
+2. For each frame image, use the PDQ hashing algorithm to produce frame perceptual hashes.
+3. Finally, assemble the collection of hashed frames to produce the video perceptual hash.
+
+> **Note:** A subset of all frames can be hashed, such as only the frames at every 1sec interval, to reduce the number of frame
+> perceptual hashes. In general, adjacent frames are very similar, so they are not very useful for finding matches.
+
+The following diagram shows the high level data flow for hashing a video with vPDQ:
+
+```mermaid
+---
+title: vPDQ Data Flow
+config:
+ theme: light
+---
+flowchart LR
+video@{ shape: lean-r, label: "Video" } -->|decoded as| frames@{ shape: processes, label: "RGB Frame"}
+frames --> pdq((PDQ))
+pdq --> phashes@{ shape: processes, label: "Frame\nPerceptual Hash"}
+phashes -->|assembled into| video_phash@{ shape: lean-r, label: "Video Perceptual Hash" }
+```
+
+#### Frame Metadata
+
+We can annotate the frame hashes with their frame number, quality(0-100 which measures gradients/features, from least "featureful" to most "featureful") and the timestamp(sec):
-The algorithm for producing the “hash” is simple: given a video, convert it into a sequence of frame images at some interval (for example, 1 frame/second). For each frame image, use the PDQ hashing algorithm on each.
+**Example**: 5 minute video, 1 frame/sec
-We can annotate these hashes with their frame number, quality(0-100 which measures gradients/features,from least featureful to most featureful) and timestamp(sec). So for a 5 minute video at 1 frame/sec, we might have:
| Frame | Quality | PDQ Hash | Timestamp(sec) |
| ------------- | ------------- | ------------- | ------------- |
| 1 | 100 | face000... | 0.000 |
@@ -32,11 +71,11 @@ We can annotate these hashes with their frame number, quality(0-100 which measur
For the matching algorithm, the frame numbers are not used, but they can still be useful for identifying matching segments when comparing videos.
-### Pruning Frames
+#### (Optional) Pruning Frames for Faster Comparison and Smaller Video Hashes
Often, many frames are repeated in a video, or frames are very close to each other in PDQ distance. It is possible to reduce the number of frames in a hash by omitting subsequent frames that are within a distance Dprune of the last retained frame.
-In the previous example, with Dprune of 2 we might instead end up with:
+Using the previous example, with Dprune of 2 we might end up with:
| Frame | PDQ Hash | Distance from last retained frame| Result |
| ------------- | ------------- | ------------- |------------- |
| 1 | face000... | N/A | Retain |
@@ -46,7 +85,7 @@ In the previous example, with Dprune of 2 we might instead end up wit
| 5 | face111... | 0 | Prune |
| ... | ... | ... | ... |
-Afterwards, what is left is:
+After pruning the previous example with Dprune of 2, the vPDQ hash may look like:
| Frame | PDQ Hash |
| ------------- | ------------- |
| 1 | face000... |
@@ -99,7 +138,9 @@ is_match = c_pct_matched >= P_c and q_pct_matched >= P_q
> **Note**: The frame number and the timestamp is not used at all in this comparison. The frames are treated as an unordered “bag of hashes”. The frame number and timestamp are included in each feature in the reference implementation in case of future expansion.
-### Pruning Candidates
+Beyond pruning frames from candidates, it may be desirable to further prune to just sampled or key frames in candidate videos to control index size, but this may result in videos being incorrectly pruned.
+
+#### (Optional) Pruning Candidate Videos for Faster Match Search
When the number of potential candidates is high, the n*Fc*Fq algorithm might be too expensive to run. One potential solution for filtering is indexing frames from candidate videos into an index like FAISS, keyed to the video to compare. Our lookup algorithm then becomes:
@@ -120,8 +161,6 @@ for c_id in candidate_video_ids:
```
-Beyond pruning frames from candidates, it may be desirable to further prune to just sampled or key frames in candidate videos to control index size, but this may result in videos being incorrectly pruned.
-
## CPP Implementation
The reference implementation for vpdq is written in C++. In addition, there are [Python bindings](#python-binding) to allow the use of vpdq from Python.