- Indistinguishability from random noise (ie, IND$-CPA): An encrypted file
is effectively indistinguishable from random noise of the same length.
- This is achieved by having all data output by the encryption process be either random data or the output of a stream cipher.
- Committing authenticity: If a file successfully decrypts to a plaintext
PT
using some passwordPW
, you can conclude that onlyPW
will decrypt that file, that file will only decrypt toPT
, and someone who knewPW
created it.- This is achieved by
(a) deriving the HMAC and encryption keys via collision-resistant
hash functions (
argon2i
andblake2b
) (b) Authenticating with an HMAC over the whole file. - Any violation of this property should yield a collision in either
argon2i
orblake2b
. - A related secondary property that
scromble
aims for is that it only outputs any data if that data can be guaranteed to come from the HMAC-authenticated file, even if someone changes the underlying file whilescromble
is reading it. This is achieved via the HMAC prefix table.
- This is achieved by
(a) deriving the HMAC and encryption keys via collision-resistant
hash functions (
- Moderate length obfuscation: The exact length of the underlying file is
difficult to tell given the length of the encrypted file.
- This is the most dubious property, and
scromble
should not be relied upon if the exact size of your data is a critical piece of information to hide. - Scromble randomly pads your file (up to 100% for small files, up to 20% for large files, see Maximum Pad Scale) but can be given a custom random padding range if need be.
- When decrypting, scromble doesn't output the padding data, so an adversary can observe how big the file if you let them observe the time or file system traffic totals of decryption. This kind of problem is basically unavoidable without implementing Oblivious RAM.
- This helps somewhat when you have certain very specific file sizes,
eg:
- 41 byte files (git branch files, a 40-byte hex string + a newline)
- exact-power-of-2 files
- 1032 byte files (on my system, many wine DLLs are this size)
- This is the most dubious property, and
The cryptographic RNG used is rand_chacha
's ChaCha20Rng
, initialized
via from_entropy()
(see
https://rust-random.github.io/rand/rand_chacha/struct.ChaCha20Rng.html and
https://rust-random.github.io/rand/rand_core/trait.SeedableRng.html#method.from_entropy
for more information).
a||b
denotes the concatenation of byte sequences a
and b
.
0^n
denotes the byte sequence consisting of n
zero bytes.
Argon2
is used in two modes: argon2i
, and argon2id
. argon2i
is configured to use 2^14
blocks (i.e. 16MB) of memory (M = 1<<14
),
one pass (T = 1
), a parallelism factor of 2 (P = 2
). argon2id
is
configured with M = 1<<13
, T = 2
, P = 2
. The output size in both
cases is 64 bytes.
For rationale, see Argon2 and the Root Key.
mac2b(key,data)
is the result of running BLAKE2b with personalization
string "sCrOmB2AuThEnTiC", output length 64 bytes, and key key
on data
data
.
key2b(L,key,data)
is the result of running BLAKE2b with personalization
string "sCrOmB2EnCrYpToR", output length L
bytes, and key key
on data
data
. In all cases, L = 32
or L = 64
.
xchacha20(key,nonce,data)
is the result of running XChaCha20 with 256-bit
key key
and 192-bit nonce nonce
on the data stream data
.
NOTE: xchacha20(key,nonce,xchacha20(key,nonce,data)) == data
.
The Salt block SB
is 64 bytes of data generated from a cryptographic RNG.
The root key RK
is derived by
RK_i := argon2i( password,key2b(64,SB,"argon2i"))
RK_id := argon2id(password,key2b(64,SB,"argon2id"))
RK := key2b(64,0,"root"||RK_i||RK_id)
The nonce block NB
is 64 bytes of data generated from a cryptographic
RNG.
The first 24 bytes (ie, 192 bits) of NB
are the nonce NONCE
.
The HMAC key is derived by HK := key2b(64,RK,"hmac")
The encryption key is derived by EK := key2b(32,RK,"encrypt")
The underlying file is treated as a sequence of bytes PT
with a 64-bit
length (measured in bytes) denoted length(PT)
. The length can be encoded
as a little-endian 8-byte sequence le_length(PT)
.
The maximum pad scale MAXPAD
used when encrypting can be manually set
with --pad-factor
. If it is not manually set, it is chosen as follows:
1.0
, iflength(PT) <= 2048
1.0 - 0.8*(length(PT)-2048)/(65536 - 2048)
, if2048 < length(PT) <= 65536
0.2
, iflength(PT) > 65536
ie, MAXPAD
starts at 1.0
, linearly goes down to 0.2
over
[2^12,2^16]
The pad length PADLEN
is some uniformly randomly chosen value between 0
and MAXPAD*max(64,length(PT))
, ie:
- if
length(PT) < 64
,PADLEN
is sampled from[0,MAXPAD*64]
- otherwise,
PADLEN
is sampled from[0,MAXPAD*length(PT)]
The padded plaintext PPT
of some plaintext data stream PT
is:
PPT = (PT||0^PADLEN||le_length(PT))
Where 0^PADLEN
denotes a sequence of PADLEN
zero bytes, and .
The core ciphertext CC
is NB||xchacha20(EK,NONCE,PPT)
The MAC block MB
is 64 bytes of data formed by running mac2b(HK,SB||CC)
.
A non-shared encrypted file is SB||CC||MB
.
Using the definitions above, encryption of a plaintext file PT
is then:
- Read the password
password
. - Generate
SB
andNB
. - Calculate the root key from
password
andSB
. - Calculate
HK
andEK
fromRK
. - Calculate
MAXPAD
fromlength(PT)
and generatePADLEN
. - Generate the core ciphertext
CC
from this data and the plaintextPT
. - Calculate the MAC block
MB
fromHK
andSB||CC
- Output the encrypted file.
The actual implementation does this in a streaming manner.
Decryption uses most of the same building blocks as encryption, but there are a few extra principles decryption must follow:
- Under no circumstance will we do anything with the ciphertext other than MAC checking before the MAC has been checked. For some reasonable commentary about why this is an important design principle, see [this blog post] (https://moxie.org/2011/12/13/the-cryptographic-doom-principle.html).
- No data inconsistent with the MAC will be output to stdout.
- No properties of the underlying data will be checked by
scromble
.
The second principle requires a secondary data structure I refer to as the "HMAC prefix table". See the HMAC prefix table section for more details.
Using the definitions above, decryption of a ciphertext file is then:
Pass 1:
- Read the password
password
. - Read
SB
. - Calculate the root key from
password
andSB
. - Calculate
HK
fromRK
. - Read the file as
SB||CC||MB
, build the HMAC prefix table, save the last 8 bytes ofCC
asenc_supposed_len
, and check thatlength(CC) >= 8
andMB == mac2b(HK,SB||CC)
. If either check fails, exit with an error.
From this point on, all operations involving data read from the file will be checked against the HMAC prefix table before any further processing is done on them. See HMAC prefix table.
Pass 2:
- Calculate
EK
fromRK
. - Read
SB
andNB
, and parseNONCE
out ofNB
. - Calculate
supposed_len
by decryptingenc_suppposed_len
withEK
andNONCE
and interpreting it as a 64-bit little endian integer. This decryption takes advantage of the fact thatXChaCha20
is seekable. - Calculate
length(PT) := min(supposed_len,length(CC)-8)
. - Read
CC
, checking against the HMAC prefix table before doing any processing. If any check fails, exit with an error. - Output the first
length(PT)
bytes ofxchacha20(EK,NONCE,CC)
. - Finally, check that the file ends with
MB
. If not, exit with an error.
Once scromble
has read the file and checked its HMAC, it must read
through the file a second time in order to actually decrypt the file.
However, this leads to a potential issue: what if the file changes between the HMAC check and when scromble reads some particular piece of the file? Then, potentially arbitrary data will get through, even though the user believes the data has been certified as authentic.
This does not seem like an easily exploitable flaw, but who knows what crazy situation someone might be using this tool in? It's better to build a tool with rock solid correctness properties than to brush a clear issue aside with "that seems too hard to exploit".
scromble
builds an in-memory data structure during its first pass so that
every data read on the second pass can be checked for authenticity, at the
cost of some extra memory usage (at most sqrt(N)*17
bytes of memory
usage for an N-byte file).
The HMAC prefix table consists of a block size block_size
and a sequence
of "prefix HMACs" hmac_prefs
. For each i
, hmac_prefs[i] == mac2b(HK,data[..(i*block_size)])
. During the HMAC-check pass, the current
HMAC result will be inserted at the end of hmac_prefs
every block_size
bytes. During the decryption pass, data is buffered until block_size
bytes have been processed and the HMAC prefix containing those bytes is
checked.
To minimize the total memory overhead from the HMAC prefix table and the
buffer required to check against it, chunk_size
gets updated
incrementally during the HMAC-check pass. If hmac_prefs.len()*HMAC_LEN >= 2*block_size
(here HMAC_LEN == 64
), then block_size
gets updated to
block_size*2
and every even entry in hmac_prefs
gets deleted -- ie:
new_block_size == block_size*2;
new_hmac_prefs.len() == hmac_prefs.len()/2;
for each i such that 0 <= i and 2*i+1 < hmac_prefs.len():
new_hmac_prefs[i] == hmac_prefs[2*i+1];
This scheme approximates the memory-usage optimal HMAC prefix table size,
which is the solution to (file_size/block_size)*HMAC_LEN == block_size
--
ie, block_size == sqrt(file_size*HMAC_LEN)
. The computational cost
required to double the block size is O(1)
amortized (ie, every 2^n
th
step requires an extra O(2^n)
work), and the worst-case memory required
is block_size + (file_size/block_size)*HMAC_LEN
. The memory usage is
highest when
block_size + (file_size/block_size)*HMAC_LEN
== 2*block_size + (file_size/(2*block_size))*HMAC_LEN
i.e.,
block_size*block_size == 32*file_size
block_size == 4*sqrt(2)*sqrt(file_size)
total_mem <= block_size + (file_size/block_size)*HMAC_LEN
total_mem <= block_size + 2*block_size
total_mem <= 3*4*sqrt(2)*sqrt(file_size)
total_mem <= (12*sqrt(2))*sqrt(file_size)
Running some numbers, this means that for this data structure, decrypting a:
- 1 mebibyte file requires <=17KiB of memory
- 1 gibibyte file requires <=544KiB of memory
- 1 tebibyte file requires <=16MiB of memory
- 1 pebibyte file requires <=544MiB of memory
- 1 exbibyte file requires <=17GiB of memory
Which seems like pretty reasonable memory overhead to me.
NOTE: for speed, scromble
double-buffers the HMAC prefix table's read
buffer, so the actual memory usage is more like 4*block_size
, ie,
(4/3)*total_mem
. So if you want to decrypt a 1 exbibyte file, you'll need
to use a computer with ~23GiB of memory.
scromble
's usage of Argon2
is not, to the author's knowledge, in
compliance with the recommendations of RFC9106
(https://datatracker.ietf.org/doc/html/rfc9106), but there are several
contributing reasons for that.
There are two distinct threats scromble
's key derivation is susceptible
to.
First, an encrypted file may be attacked in an offline fashion by an adversary with large amounts of computing resources. This threat is best mitigated by using a memory-hard KDF, since it is commonly believed that the limiting factor of large-scale compute is memory bandwidth.
For the purposes of memory-hardness, Argon2d
appears to be far better
than Argon2i
-- multiple works have found significant memory-usage
savings in versions of Argon2i
(https://eprint.iacr.org/2016/027.pdf,
https://eprint.iacr.org/2016/759.pdf), while Argon2d
does not have any
known such attacks.
Second, an adversary with access to fine-grained timing data might discern information about either the password or the derived key.
Argon2d
accesses memory in a data-dependent way, and thus is potentially
vulnerable to cache timing side-channel analysis. Such analysis has not
been reported, but cache timing attacks were also a purely theoretical
problem for AES for many years before the exploit was demonstrated.
Argon2id
is a hybrid algorithm, which does data-independent Argon2i
during an initial phase, then switches to Argon2d
. This prevents timing
side channel attacks from learning a significant amount about the input
of the KDF, but there is no clear reason to believe that side channel
analysis cannot reveal the output of the KDF, which is at least as
important.
In light of that, this recommendation from RFC9106, which was presented without any justification, seems patently absurd:
If you do not know the difference between the types or you consider side-channel attacks to be a viable threat, choose Argon2id.
To address both offline attacks and side-channel analysis, scromble
uses
both Argon2id
and Argon2i
, then mixes them to derive the final root
key. A side-channel analysis will only be able to attack the Argon2d
component of Argon2id
, and the offline attack will only have a low-memory
attack against the Argon2i
portion.
Additionally, RFC9106 recommends 128-bit keys and salts. scromble
aims
for a classical security level of 256 bits, so this is not acceptable.
In order to maximize the benefits of the Argon2d
phase of Argon2id
, a
minimum value of T = 2
is used for Argon2id
. Per RFC9106, a minimum
value of T = 1
is used for Argon2i
. In order to keep scromble
reasonably performant on low-end devices, a maximum value of P = 2
is
chosen for both. Since scromble
's maximum speed is ~200MB/s, a target
time of ~0.1s at ~1500MHz was chosen, so key derivation will be less than
half the time spent for any file above 20MB. Setting M = 1<<14
for
Argon2i
and M = 1<<13
for Argon2id
takes roughly 0.092s total at
1.5GHz on my machine. After turning off throttling (max frequency of
~4.1GHz), the time with those settings is ~0.033s.