Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Temporary storage for a secret store password. #43

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ahsimb
Copy link
Collaborator

@ahsimb ahsimb commented Dec 12, 2023

closes #42

It's kinda a proposal for the implementation of temporary storage for a secret store password. New ideas, suggestions, etc, are very welcome.

@ahsimb ahsimb added the feature Product feature label Dec 12, 2023
@ahsimb ahsimb self-assigned this Dec 12, 2023
Copy link
Member

@Nicoretti Nicoretti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code look quite clean, a few comments from my side:

  • Doing most of the crc calculation ontop of strings isn't the most efficient, but if it's fast enough, so be it.
  • Looks like regular SHM, I don't see where there would be any password specifics ("extra security")
    • What is the problem scenario you are trying to improve/solve?

exasol/shared_memory_vault.py Outdated Show resolved Hide resolved
exasol/shared_memory_vault.py Outdated Show resolved Hide resolved
@ahsimb
Copy link
Collaborator Author

ahsimb commented Dec 12, 2023

The problem I am trying to solve is about the notebook usage. Most notebooks need access to the secret store where we keep all sorts of configurations. Some of them are sensitive, like for example the database connection details. Therefore the store is protected by a password. Currently, the user has to enter the password in every notebook, which is a bit tedious. The idea is to let the user enter the password and put it into an in-memory store. Ideally, we would also limit the lifespan of this storage.
Each notebook runs in its own process. There is no parent process that we can control. So the notebooks have to use the temporary store in some sort of cooperative manner. There is no way one process can pass some kind of multiprocessor object, like a Queue, to another process. Also, we don't want to use files. Jupyter notebooks have their own way of sharing data, but it involves writing it to a file. So we don't want to use it.

return ''.join('0' if a == b else '1' for a, b in zip(sequence, crc_divisor))


def compute_crc(sequence: str, crc_divisor: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would compute this by myself. Python itself doesn't provide crc, but a hash algorithm should also be fine here. Performance isn't that critical. Or, alternative we need to import https://pypi.org/project/crc/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative, there is a crc32 in the standard lib https://docs.python.org/3/library/binascii.html?highlight=crc32#binascii.crc32

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I will leave it as it is for now. The chances are it will be gone. For example, if we start using the file lock we definitely can scrap the CRC.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

def encode(content: str, crc_divisor: str,
creation_time: datetime) -> Tuple[int, bytearray]:
"""
Creates a bytearray with encoded content and its creation datetime.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning for adding creation time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to limit the lifetime of a password in the shared memory. If we have a process that periodically performs a health check or something, it can clear the shared memory content if it was created let's say more than 4 hours ago.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Cyclic Redundancy Check (CRC) code. The CRC is computed over both the timestamp
and the content.

The need for a CRC is debatable. Its use is motivated by the problem of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concurrent access is not hypothetical. It is possible with our setup. I would recommend to protect the share memory with inter process lock. In the ITDE we used a file lock for that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@ahsimb ahsimb Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know, a file lock is usually used as a synchronization mechanism for shared memory. I don't think it's worth doing in this case unless this code finds another usage. Why would we use it in the setup? Currently, the concept is quite simple: 1. The shared memory may or may not contain the data we are after. 2. The data may be corrupted. The user has to handle both cases, which works well for passwords. We try to minimise the chances of (2) happening by using the CRC. But since we cannot completely eliminate this possibility I wonder if it's worth doing it at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the worst case, the stuff is currently written, and the user quickly changes to another notebook. You can get simply issues, if you don't do it properly.

creation_time: datetime) -> Tuple[int, bytearray]:
"""
Creates a bytearray with encoded content and its creation datetime.
Currently, the content is not being encrypted. It gets appended by the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the encryption part. We can't encrypt the password,because we would need another shared another password for that. However, we also shouldn't use the users plain text password. My suggestion is to run a key derivation on the entered password and use the derived key as the password for sqlcipher. This way we only need to share the derived key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This protects the clear text password from reused in other attacks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do that, although I don't think it makes a big difference. Why can't the attacker just use the same function we provide, with all its encoding?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A key derivation is not reversible, so they can't compute the original password and if we clean the original password as fast as possible out of the memory

return SharedMemory(name=storage_name, create=True, size=max_size)


def write_to_sm(content: str, creation_time: Optional[datetime] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently use strings to pass the password around. This way we have no save way to cleans it the process it self.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also need to disable core dumps, but I need to first check how and where

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to protect not just the secret store password but everything that the password itself is designed to protect - all sensitive data that goes in and out of the secret store - database credentials, AWS credentials, etc. Are we really up for that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is only about removing the password as fast as possible from the memory, sqlchipher probably did a key derivation on the password we gave it, so it won't save the original one in memory.

return False, datetime.min, ''


def _open_shared_memory(storage_name: str, max_size: int,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now also remembered the issue with shared memory and docker. All is nice and good, if you assume the default behavior of docker regarding ipc namespace. However, there exist the option --ipc=host, in that case the shared memory of the host is used and not cleared during docker stop. We need at least a warning in the docs and the entry point process needs try to remove it when stopping. With a process this would be given and we only would need to care about clearing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, I had an idea how we don't need to care about this. For reading we can create a session copy of the DB with a session password.

@kaklakariada kaklakariada changed the title Temporary storage for a secret store password. Spike: Temporary storage for a secret store password. Dec 18, 2023
@kaklakariada
Copy link
Contributor

This is a spike. The Ticket #42 will be closed, this PR will stay open for later reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spike: Secret store password sharing mechanism via shared memory
4 participants