Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about lambda greedy calculation #2

Open
ppartarr opened this issue Feb 17, 2021 · 3 comments
Open

Question about lambda greedy calculation #2

ppartarr opened this issue Feb 17, 2021 · 3 comments

Comments

@ppartarr
Copy link
Contributor

ppartarr commented Feb 17, 2021

Hi @rchatterjee,

I really like your work on typo correction! I read your 2016 paper and I've been digging through the code to try and understand it better.

I am curious about how the security loss lambda q greedy is calculated for the various checkers. After solving the best-q-guesses problem in your experiment, you sum the probability of the union ball for every password in the best greedy guesses:

ball = typofixer.get_ball_union(tpwlist[:q])

union_ball = set([

I understand that the union ball would be the checked ball for the always checker but this isn't the case for the blacklist & optimal checkers. It seems to me that lambda q greedy should be calculated using the checked ball with typofixer.check(password) | set([password]).

Looking forward to hearing back from you!

@rchatterjee
Copy link
Owner

So, if I understand correctly, you are asking why does the \lambda_q^greedy take union over the guesses?
The ball(tpw) denotes the set of all real passwords, for which tpw is a valid typo. Now, if the attacker guesses tpw, it will get an advantage equivalent to sum([p(rpw) for rpw in ball(tpw)]). This is exactly what will happen for q=1. Now extend this to q>1, we need to take union of balls, which is done by typofixer.get_ball_union, which you can find in this line.
Does this clarify your doubt?

Also, typofixer.check(password) | set(password) is not correct, as password is a string, and set(password) will create a set with the characters from the password.

@ppartarr
Copy link
Contributor Author

ppartarr commented Feb 18, 2021

Thanks for your answer! It didn't quite clarify what I'm confused about, so allow me to rephrase 😄

Why does lambda q greedy take the union ball instead of taking the union of the checked passwords and the password itself?

Say we have q = 1 and we are using the blacklist checker with the blacklist shown below. Let's use top 3 correctors swc-all, swc-first, rm-last. The attacker is an exact knowledge attacker and knows the password distribution. For the sake of this example let's say that after solving the greedy weighted max heap coverage problem, the attacker guesses "rockyou2".

We have typofixer.get_ball_union(["rockyou2"]) = ["Rockyou2", "ROCKYOU2", "rockyou", "rockyou2"] but if the attacker submits rockyou2 the checked passwords will be typofixer.check(tpw) | set([tpw]) = ["Rockyou2", "ROCKYOU2", "rockyou2"]

Notice how rockyou isn't checked under the blacklist checker because it is in the blacklist. Since it's not being checked, I'm confused about why it's probability is included in the calculation for the security loss lambda q greedy.

typofixer.check(tpw) | set([tpw]) is also how the weights are calculated

10 most frequent passwords in rockyou

123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123

Edit: I corrected the previous question to use set([password]) instead of the set(password)

@rchatterjee
Copy link
Owner

rchatterjee commented Mar 4, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants