In this repository we provide software to reason analytically and efficiently about transformation-based password cracking in software tools like John the Ripper (JtR) and Hashcat (HC).
- Our implementation is able to reduce the time it takes to estimate password strength via JtR / HC by orders of magnitude.
- Our software can leverage revealed password data to improve orderings of transformation rules and to identify rules and words potentially missing from an attack configuration.
Our software uses:
Thus, your system must be capable of compiling both. While we take care of this crucial step, you likely will need to install:
# To compile JtR we need
sudo apt-get install build-essential
sudo apt-get install libssl-dev
sudo apt-get install ocl-icd-opencl-dev opencl-headers pocl-opencl-icd
If you encounter any issues, please refer to JtR's installation guide.
Our tools are tested on Ubuntu 18.04 but might work on macOS, too. Our tools should support Python 3.5+, but are only tested on Python 3.6.
Generally you need to do:
# Our tool needs
sudo apt-get install build-essential python3-pip git
After this, we've built a script for you to one-click install our tool. Simple download the setup.sh file and run.
# Do not clone this repository yourself, setup.sh will do this for you!
wget https://raw.githubusercontent.com/UChicagoSUPERgroup/analytic-password-cracking/master/setup.sh
chmod +x setup.sh
# Super user permissions not required
./setup.sh
Please follow the steps in setup.sh
. If you have problem building JtR's source code, please refer to their repository for building help.
First enter the demo directory
cd analytic-password-cracking/demo
We've put some demo wordlists, rulelists and testsets there for you to play with. For example, you can run the following cmds to output an estimated guess number for each password in demo.txt
, assuming the wordlist is demo.lst
and the rulelist is demo_HC.rule
/ demo_JtR.rule
.
# For a Hashcat Rulelist
python3 demo_guess_count_file.py --word ../data/wordlists/demo.lst --rule ../data/rulelists/demo_HC.rule --pw ../data/testsets/demo.txt -s h
# For a John the Ripper Rulelist
python3 demo_guess_count_file.py --word ../data/wordlists/demo.lst --rule ../data/rulelists/demo_JtR.rule --pw ../data/testsets/demo.txt -s j
The result is saved in the results
directory with filename starting with demo_file-<wordlist>-<rulefile>-<passwordfile>.log
.
Example result:
INFO:root: # Debug Info
PasswordIdx:17 # Password ID
Password:panther1 # The correctly guessed password (from password file)
Rule:$1 # The successful rule used to produce the guess (from rule file)
Word:panther # The successful base word that was mangled (from wordlist)
Guess:19926 ( 0 - 70920 ) # Estimated guess number (lower - upper bounds)
Suppose you have a wordlist, rulelist and a file of passwords, you want to know the guess number, you could run the following commands. Note that many time for large wordlists it would take hours to preprocess.
# For a Hashcat Style
python3 demo_guess_count_file.py --word /path/to/wordlist --rule /path/to/rulelist --pw /path/to/passwordfile -s h
# For a John the Ripper Style
python3 demo_guess_count_file.py --word /path/to/wordlist --rule /path/to/rulelist --pw /path/to/passwordfile -s j
Suppose you have a wordlist, rulelist and a file of passwords. You you want to know the guess number, and you also want to filter out passwords that don't meet with the password policy. Suppose the password policy is length >= 6
, has a digit
, and has a letter
.
# For a Hashcat Style
python3 demo_guess_count_file.py --word /path/to/wordlist --rule /path/to/rulelist --pw /path/to/passwordfile -s h --length=6 --digit --letter
# For a John the Ripper Style
python3 demo_guess_count_file.py --word /path/to/wordlist --rule /path/to/rulelist --pw /path/to/passwordfile -s j --length=6 --digit --letter
There are 5 types of password policies we support: length, digits, letters, uppercase letters, lowercase letters. You can use command line flags to set each type to true. For example:
A policy that requires passwords to have a uppercase letter:
--upper
A policy that requires passwords to have length >= 8:
--length=8
A policy that requires passwords to have length >= 6, have a digit, and have a letter:
--length=6 --digit --letter
demo_guess_count_file.py [-h] -w WORDLIST_ADDR -r RULELIST_ADDR -p
PWLIST_ADDR [-d] -s
{j,jtr,JTR,JtR,John,john,J,John The
Ripper,Jtr,h,hc,HC,hashcat,H,Hashcat,Hc}
[--length {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34}]
[--digit] [--letter] [--lower] [--upper]
optional arguments:
-h, --help Show this help message and exit
-w WORDLIST_ADDR, --word WORDLIST_ADDR
Pointing to wordlist (Required)
-r RULELIST_ADDR, --rule RULELIST_ADDR
Pointing to rule list (Required)
-p PWLIST_ADDR, --pw PWLIST_ADDR
Pointing to test/victim set (Required)
-d, --debug More debug information
-s --style Can use: j/jtr for John the Ripper and h/hc for Hashcat
Run the program in JtR/HC style (Required)
--length Adding a password policy that require at least (>=) length N to make the guess
(Must be an integer from [1-34])
--digit Adding a password policy that require a digit to make the guess
--letter Adding a password policy that require a letter to make the guess
--lower Adding a password policy that require an lowercase letter to make the guess
--upper Adding a password policy that require an uppercase letter to make the guess
To specify runtime configurations, edit config.py.
Caution: By default we use look
command for binary search files, which is built-in for macOS and Ubuntu. However, the look
command in Linux only supports files < 2GB, so you should patch it for large datasets.
'running_style': Running style. Don't use this for now.
'max_password_length': An integer. Inputs/outputs greater than this are ignored
'min_cut_length': max_password_length + 1
'm_threshold': Threshold for inverting `ONM | Omit range` command
'executable_path': External JtR executable
'password_policy': The password policy specified. Use cmd line options, don't configure it here, use args instead.
'preprocess_path': Linked to preprocess root directory
'enable_regex': Whether to enable_regex or not. Only for internal testing
'debug': If in debug mode or not.
'binary_search_file_executable': The program to perform binary search. Use `look` by default (built-in on Ubuntu and macOS).
'lookup_threshold': If the number of preimages are more than this, use trie search.
'running_style': Running style. Don't use this for now.
'max_password_length': An integer. Inputs/outputs greater than this are ignored
'min_cut_length': max_password_length + 1
'm_threshold': Threshold for inverting `ONM | Omit range` command
'executable_path': External HC executable
'password_policy': The password policy specified. Use cmd line options, don't configure it here, use args instead.
'preprocess_path': Linked to preprocess root directory
'enable_regex': Whether to enable_regex or not. Only for internal testing
'debug': If in debug mode or not.
'binary_search_file_executable': The program to perform binary search. Use `look` by default (built-in on Ubuntu and macOS).
'lookup_threshold': If the number of preimages are more than this, use trie search.
'batch_size_of_words': An integer, how many words in a batch
'batch_size_of_rules': An integer or "auto", how many rules in a batch
The high-level workflow of demo_guess_count_file.py is like this:
- Read wordlist/rulelist/testset
- Do preprocessing on the rulelist.
- Identify special rules. Determine whether each rule is invertible/countable.
- If the rule is uninvertible, pipe data to disk (enumeration).
- If the rule is uncountable, pipe just a number to disk (just a number).
- If the rule is countable, figure out all the dependencies, build the tensor, make a pass on the wordlist, fill the tensor.
- Get count for countable rules.
- Other running-specific preparations.
- Inversion
- If invertible, invert the password through the rule, get the preimages, do constant time lookups on the wordlist or trie search (if too many preimages)
- If uninvertible, generally do binary search on the piped file.
- Output results (stored in
results
directory).
Reasoning Analytically About Password-Cracking Software
├── data # Input/Intermediate data
├── demo # Built-in demos
├── src # Source files
├── tests # Automated tests
├── results # Results
├── trie # External library
├── HashcatRulesEngine # External library
├── JohnTheRipper # External library
└── README.md
.
├── ...
├── src
│ ├── argparsing.py # Parse args for demo/
│ ├── clean_hashes.py # Clean fingerprint of last run
│ ├── common.py # Common classes used across different modules
│ ├── config.py # Runtime configurations
│ ├── demo_common.py # Common functions for demo/
│ ├── feature.py # Definition of different features
│ ├── feature_extraction.py # Feature extraction
│ ├── guess_count.py # Guess_count and related functions
│ ├── invert_helper.py # Utility functions and definitions for invert_rule
│ ├── invert_rule.py # Invert transformation rules
│ ├── parse.py # Rule parser
│ ├── preprocess.py # Preprocess
│ ├── tokenstr.py # Additional data structure used in invert_rule
│ └── utility.py # Utility functions used across different modules
└── ...
.
├── ...
├── tests
│ ├── test_guess_count.py # Test guess_count module in src directory
│ ├── test_guess_count_file # Test guess_count_file module in demo directory
│ ├── test_invert_rule.py # Test invert_rule module in src directory
│ └── test_parse.py # Test parse module in src directory
└── ...
.
├── ...
├── preprocess # Save preprocess data, mostly enumerated data and count
│ ├── count # Counts for uncountable rules
│ └── enumerated # Enumerated data of uninvertible rules
├── rulelists # Built-in rulelists
├── testsets # Built-in testsets
├── wordlists # Built-in wordlists
└── ...
.
├── ...
├── demo
│ └── demo_guess_count_file.py # Guess number estimation given a file of passwords
└── ...
We recommend that you look at the setup.sh
script and execute the commands there one by one and see where you fail. Our code is mostly written in Python 3 and should be compatible with Python 3.5 and Python 3.6. However, we do run external programs by building them from source. Namely John the Ripper (JtR) and HashcatRulesEngine. HashcatRulesEngine is written in C and should be compiled easily. JtR has more complex dependencies. If you fail to build JtR, please refer to JtR's repository. For all the libraries and dependencies we use, please look at the dependencies section below.
Python libraries:
numpy ('1.13.1')
pyparsing ('2.2.0')
cython ('0.26.1')
- chartrie
External libraries:
We express password policies using rejection rules (in JtR's character class style), currently there's no character class that captures all the symbols. Also, the definition of symbols varies. So we decide not to support symbols.
It's impossible to represent complex password policies in rejection rules, so we don't support them.
The results are stored in result
directory.
The guesses made by each rule is saved at preprocess_path/saved_counts.py
The piped results for uninvertible rules are saved at preprocess_path/enumerated/*.txt
This depends. If you change the wordlist, rulelist, or password policy, then yes, you have to preprocess again. However, we realize that you might want to run one configuration with multiple test sets. So if the wordlist, rulelist, password policy and running style are exactly the same as last run, we don't preprocess again. That is, if you only change test set every time you run it, it doesn't preprocess every time. And for where to find the preprocess data, please look at the sections above.
This saves the fingerprint of your last runtime configuration (aka wordlist, rulelist, password policy and running style), so that if you specify the same runtime configuration, we don't preprocess again.
Two options.
- Delete them by manually
cd src; python3 clean_hashes.py
I want it to be FASTER! Well, reasonable request. Using PyPy3.6 will give quite a lot speedup. You can also potentially parallelize the process for better performance.
This is software used and maintained for a research project and likely will have many bugs and issues.
Reasoning Analytically About Password-Cracking Software
@inproceedings{liu-19-mangling-rules,
author = {Liu, Enze and Nakanishi, Amanda and Golla, Maximilian and Cash, David and Ur, Blase},
title = {{Reasoning Analytically About Password-Cracking Software}},
booktitle = {IEEE Symposium on Security and Privacy},
year = {2019},
series = {SP~'19},
pages = {1272--1289},
address = {San Francisco, California, USA},
month = may,
publisher = {IEEE}
}
We'd like to express our gratefulness to whoever has the many people who helped our project, including but not limited to: Nasr Maswood; the authors of JtR
and HC
; other members of the JtR
and HC
communities who provided valuable feedback; the authors of chartrie
; the tech staff at UChicago; and Chameleon Cloud.
This software is licensed under the MIT license. Refer to docs/LICENSE for more information.
Technical questions? Contact Enze (Alex) Liu. You can also visit our website. If you are interested in passwords, consider contributing.