Question about lambda greedy calculation #2

ppartarr · 2021-02-17T07:42:50Z

I really like your work on typo correction! I read your 2016 paper and I've been digging through the code to try and understand it better.

I am curious about how the security loss lambda q greedy is calculated for the various checkers. After solving the best-q-guesses problem in your experiment, you sum the probability of the union ball for every password in the best greedy guesses:

mistypography/security/compute_sec_loss.ver1.py

Line 207 in f0fb62c

ball = typofixer.get_ball_union(tpwlist[:q])

mistypography/security/compute_secloss.py

Line 30 in f0fb62c

union_ball = set([

I understand that the union ball would be the checked ball for the always checker but this isn't the case for the blacklist & optimal checkers. It seems to me that lambda q greedy should be calculated using the checked ball with typofixer.check(password) | set([password]).

Looking forward to hearing back from you!

The text was updated successfully, but these errors were encountered:

rchatterjee · 2021-02-18T03:04:52Z

So, if I understand correctly, you are asking why does the \lambda_q^greedy take union over the guesses?
The ball(tpw) denotes the set of all real passwords, for which tpw is a valid typo. Now, if the attacker guesses tpw, it will get an advantage equivalent to sum([p(rpw) for rpw in ball(tpw)]). This is exactly what will happen for q=1. Now extend this to q>1, we need to take union of balls, which is done by typofixer.get_ball_union, which you can find in this line.
Does this clarify your doubt?

Also, typofixer.check(password) | set(password) is not correct, as password is a string, and set(password) will create a set with the characters from the password.

ppartarr · 2021-02-18T08:35:55Z

Thanks for your answer! It didn't quite clarify what I'm confused about, so allow me to rephrase 😄

Why does lambda q greedy take the union ball instead of taking the union of the checked passwords and the password itself?

Say we have q = 1 and we are using the blacklist checker with the blacklist shown below. Let's use top 3 correctors swc-all, swc-first, rm-last. The attacker is an exact knowledge attacker and knows the password distribution. For the sake of this example let's say that after solving the greedy weighted max heap coverage problem, the attacker guesses "rockyou2".

We have typofixer.get_ball_union(["rockyou2"]) = ["Rockyou2", "ROCKYOU2", "rockyou", "rockyou2"] but if the attacker submits rockyou2 the checked passwords will be typofixer.check(tpw) | set([tpw]) = ["Rockyou2", "ROCKYOU2", "rockyou2"]

Notice how rockyou isn't checked under the blacklist checker because it is in the blacklist. Since it's not being checked, I'm confused about why it's probability is included in the calculation for the security loss lambda q greedy.

typofixer.check(tpw) | set([tpw]) is also how the weights are calculated

mistypography/security/compute_sec_loss.ver1.py

Line 28 in f0fb62c

def power(tpw):

10 most frequent passwords in rockyou

123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123

Edit: I corrected the previous question to use set([password]) instead of the set(password)

rchatterjee · 2021-03-04T05:29:12Z

Sorry for the late reply. Let me see if I understand your question this time. If not, I will be happy to jump in a short Zoom call sometime next week. It's been a long time since I have closely looked at the code. I think you are right: The `get_ball_union` function should use `self.check` instead of `sefl.get_ball`. The `check` function is cognizant of Blacklist, etc., but the `get_ball` is not. Thanks a lot for pointing that out. I will really appreciate it if you can test and submit a pull request. - Rahul

…

On Thu, Feb 18, 2021 at 2:36 AM Philippe Partarrieu < ***@***.***> wrote: Thanks for your answer! It didn't quite clarify what I'm confused about, so allow me to rephrase 😄 Why does lambda q greedy take the union *ball* instead of taking the union of the checked passwords and the password itself. Say we have q = 1 and we are using the blacklist checker with the blacklist shown below. The attacker is an exact knowledge attacker and knows the password distribution. For the sake of this explanation let's say that after solving the greedy weighted max heap coverage problem, the attacker guesses "rockyou2". We have union_ball("rockyou2") = ["Rockyou2", "ROCKYOU2", "rockyou", "rockyou2"] but if the attacker submits rockyou2 the checked passwords will be typofixer.check(tpw) | set([tpw]) = ["Rockyou2", "ROCKYOU2", "rockyou2"] Notice how rockyou isn't checked under the blacklist checker because it is in the blacklist. Since it's not being checked, I'm confused about why it is included in the calculation for the security loss lambda q greedy. typofixer.check(tpw) | set([tpw]) is also how the weights are calculated in the first place during the experiment https://github.com/rchatterjee/mistypography/blob/f0fb62cdc42bcd2f4e0881cdeaccfa640edd0b20/security/compute_sec_loss.ver1.py#L28 *10 most frequent passwords in rockyou* 123456 12345 123456789 password iloveyou princess 1234567 rockyou 12345678 abc123 Edit: I corrected the previous question to use set([password]) instead of the set(password) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACCEW7XUVKIFGJA2HWYJYDS7TGQDANCNFSM4XX3LLXA> .

ppartarr mentioned this issue Mar 11, 2021

Fix lambda greedy q #3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about lambda greedy calculation #2

Question about lambda greedy calculation #2

ppartarr commented Feb 17, 2021 •

edited

Loading

rchatterjee commented Feb 18, 2021

ppartarr commented Feb 18, 2021 •

edited

Loading

rchatterjee commented Mar 4, 2021 via email

Question about lambda greedy calculation #2

Question about lambda greedy calculation #2

Comments

ppartarr commented Feb 17, 2021 • edited Loading

rchatterjee commented Feb 18, 2021

ppartarr commented Feb 18, 2021 • edited Loading

rchatterjee commented Mar 4, 2021 via email

ppartarr commented Feb 17, 2021 •

edited

Loading

ppartarr commented Feb 18, 2021 •

edited

Loading