Pwmodels

Password research often requires modelling password distributions from a password leak. (I have to rewrite similar code for at least four times for different projects.) Hence, this module!

In this module I plan to add different password models, such as n-gram and PCFG. In current version it supports,

n-gram or Markovian password model
Weir et al. PCFG, and
simple histogram of the passwords.

Install

$ pip install git+https://github.com/rchatterjee/pwmodels.git

Usage

from pwmodel import NGramPw, PcfgPw, HistPw
pwm = NGramPw(pwfilename='/Users/badger/passwords/myspace-withcount.tar.bz2', n=4)
print pwm.prob('passwords123')

See tests/ for more usage information.

`Passwords` module

In src/pwmodel/readpw.py file, there is a Passwords class. This class makes it much easier to read password files; especially the ones created using uniq -c command. This will convert a single password file into two files: (1) a marisa-trie .trie file that contains all the password in a prefix trie format, and (2) a numpy array in .npz format that contains the frequencies of the passwords. This is significantly better in space (on disk and memory) and speed for doing many operations, such as sampling passwords according to the distribution, finding guess ranks of a list of passwords, or getting frequency/probability of a passowrd. Every password w is assigned a unique id i, and the frequency of that password is at the i-th location in the array.

>>> from pwmodel import readpw
>>> pwm = readpw.Passwords(pass_file=fname,limit=int(limit))
>>> pwm.pw2id('password12')
367412281
>>> pwm.sample_pws(10)
<generator object Passwords.sample_pws.<locals>.<genexpr> at 0x7f3d5adf8518>
>>> l = list(pwm.sample_pws(10))
>>> l
['qwertasdfg',
 'jamez9',
 'sadigojy',
 'kastorka89055082696',
 '062766',
 '14geno',
 'love80384',
 'jessica13',
 '0550135855',
 'estevan12']
>>> pwm.guessranks(l)
array([     5873,   8763930, 103938240, 103938240,   1836406,  10060962,
       103938240,      6713, 103938240,   1113414])

Version 1.3

TODO

~~Add a function to enable the models to churn out passwords in decreasing order of their probability~~
Add better pcfg model, especially updated with keyboard sequence, and repeating characters, more natural way of spliting the password than just based on continuous sequence of letters, digits and symbols.
n-gram model is pretty slow now, because it has to comppute the sum of frequency of all the passwords that start with START (which is a lot).

Changelog

Added readpw.py, ann utility to read password leak data.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
scripts		scripts
src/pwmodel		src/pwmodel
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pwmodels

Install

Usage

`Passwords` module

Version 1.3

TODO

Changelog

About

Releases

Packages

Contributors 2

Languages

License

rchatterjee/pwmodels

Folders and files

Latest commit

History

Repository files navigation

Pwmodels

Install

Usage

Passwords module

Version 1.3

TODO

Changelog

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`Passwords` module

Packages