Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security vulnerability] Seeds are not changed on database save #219

Open
basak opened this issue Dec 15, 2020 · 18 comments · May be fixed by #366
Open

[Security vulnerability] Seeds are not changed on database save #219

basak opened this issue Dec 15, 2020 · 18 comments · May be fixed by #366

Comments

@basak
Copy link
Contributor

basak commented Dec 15, 2020

Hi,

Thank you for pykeepass!

I noticed what I believe is a security issue with the way that pykeepass saves a database (I've tested only KDBX). None of the four seeds are regenerated. These are:

  • kdbx.header.value.dynamic_header.master_seed.data
  • kdbx.header.value.dynamic_header.encryption_iv.data
  • kdbx.header.value.dynamic_header.transform_seed.data
  • kdbx.header.value.dynamic_header.protected_stream_key.data

I believe this results in a number of issues. Note that I'm not a cryptographic expert. However, the official keepass client does in my testing regenerate these parameters on every save, so I think that should be a good enough reason for pykeepass not to vary in behaviour from that.

Some issues I believe exist with not regenerating the seeds:

  • Everyone who uses create_database() will get the same seed for password derivation, which makes them vulnerable to precomputation attacks.
  • I'm looking to use pykeepass to derive a "subset" password database for lower security devices. In this case, although I might generate the derived database from a master database, I would not expect the derived database to leak any information about the master database I specifically chose not to include. However, without regeneration of the seeds, an attacker with access to the derived database would be able to precompute tables that make the master password faster to crack if it were later exposed.
  • Not regenerating protected_stream_key is not as severe, but it would still allow an attacker who manages to access only the decrypted XML to be able to compare passwords from different users for equality, when regenerating this header field would have prevented it.

Note that this affects both create_database() and PyKeePass.save(). In both cases, all seeds should be regenerated, just like the "official" keepass client does it.

Here are test cases that should pass when this issues is fixed:

import pykeepass
import pytest


EXPECTED_SALT_ATTRS = [
    "kdbx.header.value.dynamic_header.master_seed.data",
    "kdbx.header.value.dynamic_header.encryption_iv.data",
    "kdbx.header.value.dynamic_header.transform_seed.data",
    "kdbx.header.value.dynamic_header.protected_stream_key.data",
]


def get_recursive_attr(obj, name):
    try:
        idx = name.index(".")
    except ValueError:
        return getattr(obj, name)
    else:
        return get_recursive_attr(getattr(obj, name[:idx]), name[idx + 1 :])


@pytest.mark.parametrize(["attr"], [(attr,) for attr in EXPECTED_SALT_ATTRS])
def test_creation_salts_differ(tmpdir, attr):
    a = pykeepass.create_database(str(tmpdir.join("a.kdbx")))
    b = pykeepass.create_database(str(tmpdir.join("b.kdbx")))

    assert get_recursive_attr(a, attr) != get_recursive_attr(b, attr)


@pytest.mark.parametrize(["attr"], [(attr,) for attr in EXPECTED_SALT_ATTRS])
def test_saved_salts_differ(tmpdir, attr):
    a = pykeepass.create_database(str(tmpdir.join("a.kdbx")))
    old_salt = get_recursive_attr(a, attr)
    a.save()
    b = pykeepass.PyKeePass("a.kdbx")
    new_salt = get_recursive_attr(b, attr)
    assert old_salt != new_salt
@Evidlo
Copy link
Member

Evidlo commented Dec 15, 2020

Thanks for reporting this.

After looking through the keepassxc source, it seems like master_seed, encryption_iv, and protected_stream_key are indeed regenerated on save.

kdbx4
kdbx3

@basak
Copy link
Contributor Author

basak commented Dec 16, 2020

Yes, and additionally transform_seed is also regenerated, just a little more indirectly:

I used keepass' behaviour to verify, as opposed to keepassxc, but the result is the same.

@basak
Copy link
Contributor Author

basak commented Dec 21, 2020

I implemented a workaround for now by poking into kdbx.header... and kdbx.body... and resetting the necessary values with secrets.token_bytes(len(previous_value)). There were a few gotchas which will also probably need to be tackled for a proper fix. Being as it is in a pykeepass-using app rather than in pykeepass itself, my workaround is too far diverged to use against pykeepass, but hopefully the following will help:

  1. Probably obvious to pykeepass authors, but it wasn't to me: the correct attributes inside the kdbx structure to change differ depending on major_version; and for version 4, they differ again depending on kdbx.header.value.dynamic_header.kdf_parameters.data.dict["$UUID"].
  2. After modifying kdbx.header.value, I needed to del kdbx.header.data because a RawCopy will rebuild from its data attribute if that still exists. I filed RawCopy docstring: add note about rebuilding construct/construct#890 to improve the documentation on this.
  3. Due to build_file() crashes when using RawCopy with a value only construct/construct#888, calling PyKeePass.save() then crashes. I filed build_file() stream reading construct/construct#889 to fix this; in the meantime, I worked around by creating the stream myself (with w+b) and calling build_stream() instead. You'll either need build_file() crashes when using RawCopy with a value only construct/construct#888 fixed upstream before you can fix this issue, or you'll need a similar workaround in PyKeePass.save().

Phew!

@Evidlo
Copy link
Member

Evidlo commented Dec 22, 2020

Thanks for the good work.

Probably obvious to pykeepass authors, but it wasn't to me: the correct attributes inside the kdbx structure to change differ depending on major_version; and for version 4, they differ again depending on kdbx.header.value.dynamic_header.kdf_parameters.data.dict["$UUID"].

Yes, the location of the KDF parameters within the database bytestream changes depending on the database version, which is reflected in the object that kdbx_parsing returns. Ideally the kdbx_parsing would abstract this away from the main pykeepass code, but this is a limitation of construct that I couldn't figure out how to get around when I wrote the parsing code. I'm starting to consider moving away from construct and just doing the parsing with the struct module (see here, although this is very old code and can't build databases). This would give us full control over where the parsed values are stored in the database object and might make things like converting between database versions easier.

After modifying kdbx.header.value, I needed to del kdbx.header.data because a RawCopy will rebuild from its data attribute if that still exists. I filed construct/construct#890 to improve the documentation on this.

You actually figured out a problem I was having. I wanted to add functionality for changing the encryption and key derivation methods, but my changed values kept being overwritten by the previously copied bytes.

@basak
Copy link
Contributor Author

basak commented Dec 23, 2020

Ideally the kdbx_parsing would abstract this away from the main pykeepass code, but this is a limitation of construct that I couldn't figure out how to get around when I wrote the parsing code. I'm starting to consider moving away from construct...

FWIW, I didn't know about construct before I started looking at pykeepass a couple of weeks ago, and I'm very impressed both by construct and how you have used it. It's really nice how pykeepass deals with the different format versions "all at once" in a single structure. I'm looking forward to using construct to solve a similar problem I'm busy solving in a different project.

The way construct works, I think it's a necessary requirement that the data structure that comes out of it looks like the file format itself, rather than being able to abstract across fields that appear differently in the file format in different file format versions.

To abstract over this, I'd do it outside of construct, but still use construct as you have done. For example, you could create a "proxy" type Python object by capturing getattr() and setattr() that maps to different properties inside the current kdbx structure depending on major_version. This could be stateless except that it would be instantiated with a kdbx reference. It could then present a more unified view into kdbx, which would then be supplied by kdbx_parsing for the main PyKeePass class to use.

@arekbulski
Copy link

Disclosure: I am the maintainer of Construct.

Thanks for the high praise Basak. I was impressed by Construct too and that is why I became its maintainer. Evidlo, I encourage you to post more Issues at Construct's forum. Its clear you had some issues with Construct but maybe we can resolve them. I dont think using struct module is a good idea in a sophisticated project like this.

I can confirm issue # 888 will be resolved upstream before end of the year.

Another disclosure: I had a Cryptography class at Stanford. Which I have to admit was probably the best course at Coursera I ever had and I have like 20 certificates from there so... yeah, back to the topic:

Unfortunately I dont know pykeepass enough to advise anything but I can say one thing: using same seeds opens a way for using rainbow tables which I think was what basak meant by precomputation attacks. I believe Basak is right on this.

basak added a commit to basak/pykeepass that referenced this issue Dec 27, 2020
The official KeePass client produces and expects a 64 bit field, not a
32 bit field, for this parameter.

Using the incorrect length happens to parse just fine. The correct
length is encoded into the header so there is no parsing misalignment.
And because the field is little endian, parsing the field itself as a 32
bit field even though it is actually a 64 bit field also works, provided
that the field value fits into 32 bits, which it typically does.

However, if we write header field back out, we write it with a 32 bit
length, and KeePass rejects this. We weren't writing out the header
ourselves previously because kdbx.header.data wasn't being removed (see
libkeepass#219 (comment)).
However if we start doing that, such as to fix issue libkeepass#219, then this bug
is revealed.

The fix is trivial: we change the declaration to be a 64 bit field to
match the official client.
@basak
Copy link
Contributor Author

basak commented Dec 27, 2020

A couple more loose ends for version 3:

  1. There's a minor bug with a fix in kdbx3: fix length of transform_rounds field #222, without which the official KeePass client won't accept version 3 files.
  2. HeaderHash in the XML needs recalculating, without which the official KeePass client rejects a changed header as corrupt. Deleting the field also worked, though recalculating seems better. I think you can probably do this in UnprotectedStream as the recomputed header data field should be available at that stage, though I haven't tested this. I guess that would want to be conditional on major_version. As a workaround, I modified the tree attribute after recomputing the header an additional time by hand, and that worked.

@basak
Copy link
Contributor Author

basak commented Dec 27, 2020

One more. It's probably also wise to regenerate kdbx.header.value.dynamic_header.stream_start_bytes.data in the case of version 3, to stop an attacker building up sets of known plaintexts. I don't know if there's a cryptographic attack possible otherwise, but again I believe the official client does this, so I think it makes sense to avoid verifying that it's safe not to do it, and just do it.

To summarise, here are the fields to regenerate for version 3:

  • kdbx.header.value.dynamic_header.master_seed.data
  • kdbx.header.value.dynamic_header.encryption_iv.data
  • kdbx.header.value.dynamic_header.protected_stream_key.data
  • kdbx.header.value.dynamic_header.stream_start_bytes.data

For version 4:

  • kdbx.header.value.dynamic_header.master_seed.data
  • kdbx.header.value.dynamic_header.encryption_iv.data
  • kdbx.body.payload.inner_header.protected_stream_key.data

And additionally kdbx.header.value.dynamic_header.transform_seed.data for version 3, and KDF-specific for version 4 (eg. kdbx.header.value.dynamic_header.kdf_parameters.data.dict.S.value if argon2 is used).

Then HeaderHash (if version 3) and the individual protected XML password elements need recalculating.

I've not looked into version < 3 (doesn't affect me).

Evidlo pushed a commit that referenced this issue Jan 4, 2021
The official KeePass client produces and expects a 64 bit field, not a
32 bit field, for this parameter.

Using the incorrect length happens to parse just fine. The correct
length is encoded into the header so there is no parsing misalignment.
And because the field is little endian, parsing the field itself as a 32
bit field even though it is actually a 64 bit field also works, provided
that the field value fits into 32 bits, which it typically does.

However, if we write header field back out, we write it with a 32 bit
length, and KeePass rejects this. We weren't writing out the header
ourselves previously because kdbx.header.data wasn't being removed (see
#219 (comment)).
However if we start doing that, such as to fix issue #219, then this bug
is revealed.

The fix is trivial: we change the declaration to be a 64 bit field to
match the official client.
pschmitt pushed a commit to pschmitt/pykeepass that referenced this issue Jan 27, 2021
The official KeePass client produces and expects a 64 bit field, not a
32 bit field, for this parameter.

Using the incorrect length happens to parse just fine. The correct
length is encoded into the header so there is no parsing misalignment.
And because the field is little endian, parsing the field itself as a 32
bit field even though it is actually a 64 bit field also works, provided
that the field value fits into 32 bits, which it typically does.

However, if we write header field back out, we write it with a 32 bit
length, and KeePass rejects this. We weren't writing out the header
ourselves previously because kdbx.header.data wasn't being removed (see
libkeepass#219 (comment)).
However if we start doing that, such as to fix issue libkeepass#219, then this bug
is revealed.

The fix is trivial: we change the declaration to be a 64 bit field to
match the official client.
@Evidlo
Copy link
Member

Evidlo commented Feb 6, 2021

Here's my solution to generate new seeds in a simplified pseudo-KeePass database.

from construct import Struct, RawCopy, Bytes, Checksum, this, Subconstruct
from construct import stream_seek, stream_tell, stream_read, stream_write
from Cryptodome.Random import get_random_bytes
from io import BytesIO

# simple placeholder checksum function
def check_func(header_data, master_seed):
    return header_data + master_seed


class RandomBytes(Bytes):
    """Same as Bytes, but generate random bytes when building"""

    def _build(self, obj, stream, context, path):
        length = self.length(context) if callable(self.length) else self.length
        data = get_random_bytes(length)
        stream_write(stream, data, length, path)
        return data

class Copy(Subconstruct):
    """Same as RawCopy, but don't create parent container when parsing.
    Instead store data in ._data attribute of subconstruct, and never rebuild from data
    """

    def _parse(self, stream, context, path):
        offset1 = stream_tell(stream, path)
        obj = self.subcon._parsereport(stream, context, path)
        offset2 = stream_tell(stream, path)
        stream_seek(stream, offset1, 0, path)
        obj._data = stream_read(stream, offset2-offset1, path)
        return obj

    def _build(self, obj, stream, context, path):
        offset1 = stream_tell(stream, path)
        obj = self.subcon._build(obj, stream, context, path)
        offset2 = stream_tell(stream, path)
        stream_seek(stream, offset1, 0, path)
        obj._data = stream_read(stream, offset2-offset1, path)
        return obj

# simple database format with header and checksum
s = Struct(
    "header" / Copy(Struct(
        "master_seed" / RandomBytes(8),
        "more_data" / Bytes(8)
    )),
    "hash" / Checksum(
        Bytes(24),
        lambda ctx: check_func(ctx.header._data, ctx.header.master_seed),
        this
    )
)

# synthesize a 'database'
master_seed = b'00000000'
more_data = b'12345678'
header = master_seed + more_data
data = header + check_func(header, master_seed)

# parse the database
parsed = s.parse(data)

# rebuild and try to reparse final result
data2 = s.build(parsed)
s.parse(data2)
assert data != data2, "Database unchanged"

@Evidlo
Copy link
Member

Evidlo commented Feb 17, 2021

I'd appreciate some feeback on the above.

@arekbulski
Copy link

As far as Construct is concerned, I see no problems with your code.

@basak
Copy link
Contributor Author

basak commented Feb 28, 2021

I really like this approach. Nice work! I wonder if RandomBytes and Copy are sufficiently general to be appropriate to include in Construct itself? If not they work equally well here.

One downside of this approach is that each build triggers a rekey. This is what we want to fix this issue, of course, but I wonder if there are times when we might want to build without rekeying. For example, for unit testing.

I don't have a strong feeling as to whether explicit or implicit rekeying would be better. I think it's difficult to say without attempting one approach or the other.

I think this mechanism should be sufficient to do most of the necessary work. HeaderHash (version 3) will need recalculating from my notes above; I'm not sure if this mechanism will do this, or it needs implementing separately?

@arekbulski
Copy link

Those 2 classes surely are a creative work. But they wont get into Construct mainstream. Copy already does what RawCopy does, and in a way I dont like, too obscure. And RandomBytes is not general enough to be included. But if this works for you, I dont see why not.

@A6GibKm
Copy link
Contributor

A6GibKm commented Jun 23, 2021

Is there any update to this? It seems like a rather problematic issue.

@vszakats
Copy link

vszakats commented Jan 7, 2023

Can someone suggest the necessary custom code to init the seeds (KDBX4, default KDF/encryption) before calling .save()?:

kp = create_database("test.kdbx", password=test, keyfile=None, transformed_key=None)
kp.kdbx.body.payload.xml = etree.fromstring(xml_str)
kp.kdbx.header.value.dynamic_header.master_seed.data = get_random_bytes(8)
kp.kdbx.header.value.dynamic_header.encryption_iv.data = ??
kp.kdbx.body.payload.inner_header.protected_stream_key.data = ??
#kp.kdbx.header.value.dynamic_header.kdf_parameters.data.dict.S.value =   # if argon2 is used
kp.save()

Without this, the output is reproducible, which is a showstopper for encrypted data.

A6GibKm added a commit to A6GibKm/pykeepass that referenced this issue Nov 3, 2023
@A6GibKm A6GibKm linked a pull request Nov 3, 2023 that will close this issue
A6GibKm added a commit to A6GibKm/pykeepass that referenced this issue Nov 3, 2023
A6GibKm added a commit to A6GibKm/pykeepass that referenced this issue Nov 4, 2023
@SpencerMcO
Copy link

Any update on this? I don't think I'd have the bandwidth or know-how to do this myself, but would definitely like to resolve this vulnerability

@A6GibKm
Copy link
Contributor

A6GibKm commented Nov 15, 2024

I tried at #366, but I am not familiar enough with the internals to make it work, some tests fail and I was not able to find out why.

Here is where I am very grateful for the massive amount of tests in the project.

@Evidlo
Copy link
Member

Evidlo commented Nov 29, 2024

I've tracked the issue down to DynamicDict. The problem is that when the database is being built after a save, the value of master_seed seen in compute_header_hmac_hash is the old value before being randomly generated. Below are two standalone examples utilizing DynamicDict and RandomBytes that demonstrate the issue:

Broken Example (with DynamicDict)

#!/usr/bin/env python3

# debugging why RandomBytes doesn't update context

from collections import OrderedDict
from Cryptodome.Random import get_random_bytes
from construct import (
    Bytes, stream_write, Struct, Checksum, this,
    GreedyBytes, Prefixed, Mapping, Byte, Int32ul,
    Switch, RepeatUntil, RawCopy, Adapter, Container, ListContainer
)

class DynamicDict(Adapter):
    """ListContainer <---> Container
    Convenience mapping so we dont have to iterate ListContainer to find
    the right item

    FIXME: lump kwarg was added to get around the fact that InnerHeader is
    not truly a dict.  We lump all 'binary' InnerHeaderItems into a single list
    """

    def __init__(self, key, subcon, lump=[]):
        super().__init__(subcon)
        self.key = key
        self.lump = lump

    # map ListContainer to Container
    def _decode(self, obj, context, path):
        d = OrderedDict()
        for l in self.lump:
            d[l] = ListContainer([])
        for item in obj:
            if item[self.key] in self.lump:
                d[item[self.key]].append(item)
            else:
                d[item[self.key]] = item

        return Container(d)

    # map Container to ListContainer
    def _encode(self, obj, context, path):
        l = []
        for key in obj:
            if key in self.lump:
                l += obj[key]
            else:
                l.append(obj[key])

        return ListContainer(l)


class RandomBytes(Bytes):
    """Same as Bytes, but generate random bytes when building"""

    def _build(self, obj, stream, context, path):
        print('generating random bytes...')
        length = self.length(context) if callable(self.length) else self.length
        data = get_random_bytes(length)
        print('old:', sum(context.data))
        print('new:', sum(data))
        stream_write(stream, data, length, path)
        return data

def compute_hash(context):
    """dumb hashing function which always returns value 1"""

    # this is supposed to be a newly random value (when building)
    print('rand seen during hashing:', sum(context.head.value.rand.data))
    return b'1'

Item = Struct(
    'id' / Mapping(Byte, {'rand': 4, 'end': 0}),
    'data' / Switch(
        this.id,
        {
            'rand': RandomBytes(32),
            'end': Bytes(32),
        },
    )
)

Head = DynamicDict('id', RepeatUntil(
    lambda item, a, b: item.id == 'end',
    Item
))

# header
s = Struct(
    'head' / RawCopy(Head),
    'hash' / Checksum(
        Bytes(1),
        compute_hash,
        this,
    )
)

# raw data to parse
b = (
    # --- Random ---
    # id
    b'\x04' +
    # seed
    b'1' * 32 +
    # --- End ---
    # id
    b'\x00' +
    # value
    b'1' * 32 +
    # --- Hash ---
    b'1'
)

print('-------------')
print('PARSING')
print('-------------')
parsed1 = s.parse(b)

print('-------------')
print('BUILDING')
print('-------------')
del parsed1.head.data
built = s.build(parsed1)

print('-------------------------------')
print('PARSE')
print('-------------------------------')
parsed2 = s.parse(built)

>>> %run issue219_randombytes_context_broken.py
-------------
PARSING
-------------
rand seen during hashing: 1568
-------------
BUILDING
-------------
generating random bytes...
old: 1568
new: 3161
rand seen during hashing: 1568
-------------------------------
PARSE
-------------------------------
rand seen during hashing: 3161
Working Example (without DynamicDict)

#!/usr/bin/env python3

# debugging why RandomBytes doesn't update context

from collections import OrderedDict
from Cryptodome.Random import get_random_bytes
from construct import (
    Bytes, stream_write, Struct, Checksum, this,
    GreedyBytes, Prefixed, Mapping, Byte, Int32ul,
    Switch, RepeatUntil, RawCopy, Adapter, Container, ListContainer
)

class RandomBytes(Bytes):
    """Same as Bytes, but generate random bytes when building"""

    def _build(self, obj, stream, context, path):
        print('generating random bytes...')
        length = self.length(context) if callable(self.length) else self.length
        data = get_random_bytes(length)
        print('old:', sum(context.data))
        print('new:', sum(data))
        stream_write(stream, data, length, path)
        return data

def compute_hash(context):
    """dumb hashing function which always returns value 1"""

    # this is supposed to be a newly random value (when building)
    print('rand seen during hashing:', sum(context.head.value[0].data))
    return b'1'

Item = Struct(
    'id' / Mapping(Byte, {'rand': 4, 'end': 0}),
    'data' / Switch(
        this.id,
        {
            'rand': RandomBytes(32),
            'end': Bytes(32),
        },
    )
)

Head = RepeatUntil(
    lambda item, a, b: item.id == 'end',
    Item
)

# header
s = Struct(
    'head' / RawCopy(Head),
    'hash' / Checksum(
        Bytes(1),
        compute_hash,
        this,
    )
)

# raw data to parse
b = (
    # --- Random ---
    # id
    b'\x04' +
    # seed
    b'1' * 32 +
    # --- End ---
    # id
    b'\x00' +
    # value
    b'1' * 32 +
    # --- Hash ---
    b'1'
)

print('-------------')
print('PARSING')
print('-------------')
parsed1 = s.parse(b)

print('-------------')
print('BUILDING')
print('-------------')
del parsed1.head.data
built = s.build(parsed1)

print('-------------------------------')
print('PARSE')
print('-------------------------------')
parsed2 = s.parse(built)

>>> %run issue219_randombytes_context_works.py
-------------
PARSING
-------------
rand seen during hashing: 1568
-------------
BUILDING
-------------
generating random bytes...
old: 1568
new: 4489
rand seen during hashing: 4489
-------------------------------
PARSE
-------------------------------
rand seen during hashing: 4489

In the Building stage, the value of rand seen during hashing should be equal to the new value of the rand field. Somehow DynamicDict needs to be modified so that the new randomly generated value is available during building when compute_hash is called.

Unfortunately I am deep in writing my dissertation right now so I don't have a ton of free time to spend.

A6GibKm added a commit to A6GibKm/pykeepass that referenced this issue Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants