-
Notifications
You must be signed in to change notification settings - Fork 90
Systems of equations attack to recover visited websites? #112
Comments
Hi Lukasz, I don't quite understand what you're proposing here. For a single FLoC cohort, per the data here, there are hundreds of different browsing histories that lead to being in the same cohort (at least 735, for the "Chrome.2.1" clustering algorithm). It seems like you're trying to make a stronger inference by looking at one person's sequence of cohorts over time. Do you think the addition here is some mathematical model of the likelihood of browsing history week-to-week consistency vs variability? From a Bayesian point of view, each time you observe someone's cohort, you can update your belief that the person has visited some particular website. So before I've seen you at all, my guess at P(you visit facebook every week) is some baseline probability p0, say 44% if you're in London. Maybe after seeing someone's FLoC once, that would change to 51%, and after seeing their FLoC every week for a year it would be 58%. That could be influenced by some sort of algebraic system-of-equations approach, or just by observing the behavior of the people in a FLoC. I would expect the observational method to be better, because the algebraic one would run into a skyrocketing number of unknowns about all the other sites someone might have visited. Or am I completely missing your point here? Obviously all these numbers are totally made up, but they represent the kind of information extraction that I can imagine. |
Yes, I wonder what would be the impact of observing FloC IDs for several week on possible discovery of the browsed sites. Specifically - does it decrease the likelihood of certain collisions for specific users?
It's a fair way of seeing things, though I did not have this one in mind. But the ideas could maybe help in the above hypothesis. |
I'm building on this issue by @johnwilander.
Just a slight thought.
What if we consider a slightly different threat/exploitation scenario (unless it's simply a flavour of tge remarks quoted n the above, which is why I retain them here), which are linked to the risks I already pointed to. Specifically reversing the cohort ID to obtain the actually visited websites?.
So the idea to hypothetically improve such reversal could follow a reasoning where we know that the sets of visited sites in week_i (i = given week of the year) corresponds to a specific ID_i. The computation of the FloC is based on input (website addresses). So I wonder if it would be possible to mathematically construct a system of equations of the form:
And then use the properties of SimHash to obtain the visited websites or at least improve the inference of the visited websites. Note, I did not focus on the analytical solution so I do not know the circumstances when such a system of equations would be solvable. It would be interesting to consider it in the threat model, though, so I leave a proof exercise to the proponents. Thank you.
The text was updated successfully, but these errors were encountered: