You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We propose a way to optimize the matchkey size by letting the User Agent Vendor (UAV) be a fallback matchkey provider for its User Agents (UAs) instead of having the UA generate random on-device matchkeys when none has been set by a Matchkey Provider (MKP).
Our original design for setting matchkeys allowed for two ways of setting the matchkey 1) where a MKP calls the set_match_key() API and 2) when no MKP has set a key the device would generate its own matchkey the first time get_encrypted_matchkey() is called. The second case leads to the need for much larger size matchkeys to avoid collisions (think birthday paradox). Here we propose an additional way to set the matchkey which allows the User Agent Vendor to be a fallback matchkey provider for its User Agents.
The User Agent Vendor would supply each of its User Agents with a unique matchkey. These matchkeys would only enable same-user-agent attributions, thus enabling the same functionality as having the UA set its own matchkey randomly. But what we achieve here is a significant optimization in terms of the size required for these matchkeys. Since they can be set uniquely as the UAV we don’t need to increase the size to bound the probability of spurious collisions. Since sorting in the MPC attribution scales with the number of bits in the matchkey and sort is still the most expensive stage in MPC, this translates to a large optimization for the MPC.
Setting the matchkey
We propose three ways of setting the matchkey: 1) by a Matchkey Provider (MKP), 2) by the User Agent Vendor (UAV), or 3) by the UA itself. We also propose that the matchkey itself encodes how it was set so there is no overlap between matchkeys set from the different methods.
For the structure of the matchkey
The first bit is called set_by_MKP and is a 1 if the matchkey has been set by that matchkey provider and 0 otherwise. When a MKP sets a matchkey they will set the 44 bit mk_from_MKP part to be unique for each of their users.
In the case a MKP has not set their matchkey on a UA and the get_encrypted_matchkey() is called, the API will still return a matchkey but it will be one unique to that user agent. For setting such a matchkey we propose that the UAV can act as a fallback MKP supplying each of its user agents with a unique matchkey. The second bit of the matchkey is called set_by_UAV and is a 1 if the UAV has set the matchkey and 0 otherwise. In the case the UAV has set the matchkey, they will use the next 8 bits to set their UAV_id. The reason for this is that we don’t want collisions coming from different UAVs and don’t want to rely on the User Agent String separate from these IPA APIs to determine which UA a user is using. With the remaining 34 bits the UAV will provide each of its UA with a unique key. Allocating 8 bits for the UAV_id would help to future proof the design and if not that many are needed immediately then browsers with lots of users could take multiple entries (from which they can allocate as though they were additional bits), which would allow small and large browsers to coexist.
In the case that neither a MKP nor a UAV have set a matchkey the first 2 bits will be zeros and the UA will generate and store its own matchkey the first time get_encrypted_matchkey() is called. Since UAs would be doing this independently of each other there will be the possibility of collisions.
Hiding user counts
When the MKP or the UAV sets a matchkey uniquely for each of its users, they likely do not want the matchkeys to reveal anything about their total user counts (in case there is a way with device access to learn what matchkeys have been set on your device). To prevent this a MKP/UAV could use a keyed pseudo-random permutation (PRP) to map from a user index [0, number_of_users) to a still small space [0,2^34). Since this is a permutation, it will not introduce any collisions but will prevent a few matchkeys from telling anything about how many total users a MKP has. This may be one such available primitive to use for the PRP.
Collisions for UA generated matchkeys
As mentioned above, for matchkeys that are generated independently on devices when no MKP or UAV has set the matchkey there is the possibility of collisions. We would like to see how many such matchkeys a query can contain before there start to be spurious collisions. The expected number of matchekeys that are the same as some other in the query is$E=n(1-(1-1/N)^{n-1})$ where n is query size and N is the number of matchkeys.
Using this formula, if we can tolerate an expected number of collisions < 10, then with 43 bits for the matchkey we need to have fewer than ~9.3M randomly generated matchkeys in a query.
If we can tolerate an expected number of collisions < 1, then with 43 bits for the matchkey we need to have fewer than ~3M randomly generated matchkeys in a query.
Matchkey Specification: UAV as fallback MKP
Summary:
Our original design for setting matchkeys allowed for two ways of setting the matchkey 1) where a MKP calls the
set_match_key()
API and 2) when no MKP has set a key the device would generate its own matchkey the first timeget_encrypted_matchkey()
is called. The second case leads to the need for much larger size matchkeys to avoid collisions (think birthday paradox). Here we propose an additional way to set the matchkey which allows the User Agent Vendor to be a fallback matchkey provider for its User Agents.The User Agent Vendor would supply each of its User Agents with a unique matchkey. These matchkeys would only enable same-user-agent attributions, thus enabling the same functionality as having the UA set its own matchkey randomly. But what we achieve here is a significant optimization in terms of the size required for these matchkeys. Since they can be set uniquely as the UAV we don’t need to increase the size to bound the probability of spurious collisions. Since sorting in the MPC attribution scales with the number of bits in the matchkey and sort is still the most expensive stage in MPC, this translates to a large optimization for the MPC.
Setting the matchkey
We propose three ways of setting the matchkey: 1) by a Matchkey Provider (MKP), 2) by the User Agent Vendor (UAV), or 3) by the UA itself. We also propose that the matchkey itself encodes how it was set so there is no overlap between matchkeys set from the different methods.
For the structure of the matchkey
The first bit is called
set_by_MKP
and is a 1 if the matchkey has been set by that matchkey provider and 0 otherwise. When a MKP sets a matchkey they will set the 44 bitmk_from_MKP
part to be unique for each of their users.In the case a MKP has not set their matchkey on a UA and the get_encrypted_matchkey() is called, the API will still return a matchkey but it will be one unique to that user agent. For setting such a matchkey we propose that the UAV can act as a fallback MKP supplying each of its user agents with a unique matchkey. The second bit of the matchkey is called
set_by_UAV
and is a 1 if the UAV has set the matchkey and 0 otherwise. In the case the UAV has set the matchkey, they will use the next 8 bits to set theirUAV_id
. The reason for this is that we don’t want collisions coming from different UAVs and don’t want to rely on the User Agent String separate from these IPA APIs to determine which UA a user is using. With the remaining 34 bits the UAV will provide each of its UA with a unique key. Allocating 8 bits for theUAV_id
would help to future proof the design and if not that many are needed immediately then browsers with lots of users could take multiple entries (from which they can allocate as though they were additional bits), which would allow small and large browsers to coexist.In the case that neither a MKP nor a UAV have set a matchkey the first 2 bits will be zeros and the UA will generate and store its own matchkey the first time
get_encrypted_matchkey()
is called. Since UAs would be doing this independently of each other there will be the possibility of collisions.Hiding user counts
When the MKP or the UAV sets a matchkey uniquely for each of its users, they likely do not want the matchkeys to reveal anything about their total user counts (in case there is a way with device access to learn what matchkeys have been set on your device). To prevent this a MKP/UAV could use a keyed pseudo-random permutation (PRP) to map from a user index [0, number_of_users) to a still small space [0,2^34). Since this is a permutation, it will not introduce any collisions but will prevent a few matchkeys from telling anything about how many total users a MKP has. This may be one such available primitive to use for the PRP.
Collisions for UA generated matchkeys
As mentioned above, for matchkeys that are generated independently on devices when no MKP or UAV has set the matchkey there is the possibility of collisions. We would like to see how many such matchkeys a query can contain before there start to be spurious collisions. The expected number of matchekeys that are the same as some other in the query is$E=n(1-(1-1/N)^{n-1})$ where n is query size and N is the number of matchkeys.
Thanks @martinthomson for lots of good discussions on this and for suggesting how to hide the user counts and @benjaminsavage, @richajaindce and @akoshelev for reviewing.
The text was updated successfully, but these errors were encountered: