Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification requested re: CompressedBase64UrlEncoder #26

Open
matt-martin opened this issue Jun 2, 2023 · 0 comments
Open

Clarification requested re: CompressedBase64UrlEncoder #26

matt-martin opened this issue Jun 2, 2023 · 0 comments

Comments

@matt-martin
Copy link

matt-martin commented Jun 2, 2023

I'm trying to understand why CompressedBase64UrlEncoder is written the way it is. More specifically, I'm not sure what this while loop is doing and it seems to me like it can be removed:

    while (bitString.length % 8 > 0) {
      bitString += "0";
    }

Imagine the bitString is 6 bits long. That could be represented as a single base64 digit, but this while loop will add two 0's at the end so that the length of the bitString is 8. Then the subsequent while loop will add four more 0's to get the length of the bitString back to a multiple of 6 (i.e. 12). That is 6 bits of padding and it means that the encoded value in this case has a seemingly unnecessary "A" at the end.

For a more concrete example, consider this test case for the Header section:

  it("should encode section ids [2] to DBABMA", (): void => {
    let headerV1 = new HeaderV1();
    headerV1.setFieldValue("SectionIds", [2]);
    expect(headerV1.encode()).to.eql("DBABMA");
  });

The six bits represented by the "A" at the end of "DBABMA" are unnecessary and the Header can be decoded just fine without it. DBABM~ decodes without issue as:

{"tcfeuv2":{"Version":2,"Created":"Fri Jun 02 2023 15:19:38 GMT-0700 (Pacific Daylight Time)","LastUpdated":"Fri Jun 02 2023 15:19:38 GMT-0700 (Pacific Daylight Time)","CmpId":0,"CmpVersion":0,"ConsentScreen":0,"ConsentLanguage":"EN","VendorListVersion":0,"PolicyVersion":2,"IsServiceSpecific":false,"UseNonStandardStacks":false,"SpecialFeatureOptins":[false,false,false,false,false,false,false,false,false,false,false,false],"PurposeConsents":[false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false],"PurposeLegitimateInterests":[false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false],"PurposeOneTreatment":false,"PublisherCountryCode":"AA","VendorConsents":[],"VendorLegitimateInterests":[],"PublisherRestrictions":[],"PublisherPurposesSegmentType":3,"PublisherConsents":[false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false],"PublisherLegitimateInterests":[false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false],"NumCustomPurposes":0,"PublisherCustomConsents":[],"PublisherCustomLegitimateInterests":[],"VendorsAllowedSegmentType":2,"VendorsAllowed":[],"VendorsDisclosedSegmentType":1,"VendorsDisclosed":[]}}

EDIT/UPDATE: On closer inspection, I see line 272 of the spec actually states that the Header should be encoded as DBABM. But then down below on line 603 it states the same Header should be encoded as DBABMA.

SIDENOTE: this might be obvious to folks who are already experts in Base64 encoding, but the traditional (i.e. non-compressed) base64 encoded string for a Header with a single section ID of 2 (i.e. the same example we used previously) would be DBABMAAA (e.g. see this test). If we drop that padding in order to compress the encoded GPP string as much as possible, then other utilities/libraries that expect traditional Base64 encoding may not be happy. I ran into this when trying to come up with a one-liner on my Mac to convert a base64 string to a bit string. This command produces no output:

echo -n "DBABM" | base64 -d | xxd -b -g0

But adding the full/traditional padding to the end makes the base64 command happy again:

$ echo -n "DBABMAAA" | base64 -d | xxd -b -g0
00000000: 000011000001000000000001001100000000000000000000  ...0..

It might be worth explicitly mentioning somewhere in the spec that if folks, for some reason, want to decode a base64 encoded values from a GPP string using a standard Base64 utility/library, they might need to add "A" to the end of the string until the overall length of the encoded string is a multiple of 4. Or at least that is my current impression based on the example above. Folks with more Base64 knowledge/expertise should feel free to correct me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant