-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike: Temporary storage for a secret store password. #43
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,248 @@ | ||
from typing import Optional, Tuple | ||
from multiprocessing.shared_memory import SharedMemory | ||
from math import ceil | ||
from datetime import datetime | ||
import struct | ||
|
||
|
||
DEFAULT_CRC_DIVISOR = '1001011001100101' | ||
DEFAULT_MAX_SIZE = 100 | ||
DEFAULT_STORAGE_NAME = 'notebook_connector_vault' | ||
|
||
|
||
def _xor(sequence: str, crc_divisor: str) -> str: | ||
return ''.join('0' if a == b else '1' for a, b in zip(sequence, crc_divisor)) | ||
|
||
|
||
def compute_crc(sequence: str, crc_divisor: str) -> str: | ||
""" | ||
Computes a Cyclic Redundancy Check (CRC) code for a provided binary sequence using | ||
a provided polynomial divisor. | ||
Check this wiki for details: https://en.wikipedia.org/wiki/Cyclic_redundancy_check | ||
For example if the sequence is '11010011101100000' and the divisor is '011' the | ||
output crc code will be '100'. Note that the n+1 bit long divisor, commonly used in | ||
literature, 1011 in the above example, is assumed to have the most significant bit | ||
(MSB) equal 1. Here the MSB is omitted. | ||
""" | ||
n = len(crc_divisor) | ||
reminder = sequence[:n] | ||
for i in range(n, len(sequence)): | ||
starts_one = reminder[0] == '1' | ||
reminder = reminder[1:] + sequence[i] | ||
if starts_one: | ||
reminder = _xor(reminder, crc_divisor) | ||
return reminder | ||
|
||
|
||
def _get_byte_size(n: int) -> int: | ||
"""Returns the number of bytes required to store an integer value""" | ||
return int(ceil(n.bit_length() / 8)) | ||
|
||
|
||
def _get_divisor_size(crc_divisor: str) -> int: | ||
"""Calculates the byte size of a CRC divisor""" | ||
return int(ceil(len(crc_divisor) / 8)) | ||
|
||
|
||
def _bytes_to_bin(msg_bytes: bytes) -> str: | ||
"""Converts byte array to a binary string, e.g.[6, 7] => 01100111""" | ||
return ''.join(format(c, '08b') for c in msg_bytes) | ||
|
||
|
||
def encode(content: str, crc_divisor: str, | ||
creation_time: datetime) -> Tuple[int, bytearray]: | ||
""" | ||
Creates a bytearray with encoded content and its creation datetime. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reasoning for adding creation time There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may want to limit the lifetime of a password in the shared memory. If we have a process that periodically performs a health check or something, it can clear the shared memory content if it was created let's say more than 4 hours ago. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
Currently, the content is not being encrypted. It gets appended by the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. About the encryption part. We can't encrypt the password,because we would need another shared another password for that. However, we also shouldn't use the users plain text password. My suggestion is to run a key derivation on the entered password and use the derived key as the password for sqlcipher. This way we only need to share the derived key. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This protects the clear text password from reused in other attacks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can do that, although I don't think it makes a big difference. Why can't the attacker just use the same function we provide, with all its encoding? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A key derivation is not reversible, so they can't compute the original password and if we clean the original password as fast as possible out of the memory |
||
Cyclic Redundancy Check (CRC) code. The CRC is computed over both the timestamp | ||
and the content. | ||
|
||
The need for a CRC is debatable. Its use is motivated by the problem of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The concurrent access is not hypothetical. It is possible with our setup. I would recommend to protect the share memory with inter process lock. In the ITDE we used a file lock for that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I know, a file lock is usually used as a synchronization mechanism for shared memory. I don't think it's worth doing in this case unless this code finds another usage. Why would we use it in the setup? Currently, the concept is quite simple: 1. The shared memory may or may not contain the data we are after. 2. The data may be corrupted. The user has to handle both cases, which works well for passwords. We try to minimise the chances of (2) happening by using the CRC. But since we cannot completely eliminate this possibility I wonder if it's worth doing it at all. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the worst case, the stuff is currently written, and the user quickly changes to another notebook. You can get simply issues, if you don't do it properly. |
||
non-synchronised concurrent access to the shared memory. In theory, the content | ||
can get corrupted by simultaneous reading/writing. Hence, is the need to validate | ||
the data retrieved from the shared memory. On the other hand, given the use case, | ||
the possibility of concurrent access is only hypothetical. The implication of | ||
reading an impaired data is insignificant and CRC cannot guarantee the validity | ||
with 100% accuracy anyway. | ||
""" | ||
|
||
# Prepend the content by the timestamp and convert the whole thing to a binary | ||
# sequence. | ||
ts_bytes = struct.pack('d', creation_time.timestamp()) | ||
body = ts_bytes + content.encode('utf8') | ||
bin_body = _bytes_to_bin(body) | ||
|
||
# Compute the CRC of the content right-padded with n zeros. | ||
divisor_size = _get_divisor_size(crc_divisor) | ||
padding = '0' * len(crc_divisor) | ||
bin_cr = compute_crc(bin_body + padding, crc_divisor) | ||
cr = int(bin_cr, 2) | ||
|
||
# Put together the content bytes and the CRC bytes. | ||
cont_size = len(body) + divisor_size | ||
enc_content = bytearray(cont_size) | ||
enc_content[:-divisor_size] = body | ||
enc_content[-divisor_size:] = cr.to_bytes(divisor_size, byteorder='big', | ||
signed=False) | ||
return cont_size, enc_content | ||
|
||
|
||
def decode(enc_content: bytearray, crc_divisor: str) -> Tuple[bool, datetime, str]: | ||
""" | ||
Decodes and validates a content encoded in a bytearray. | ||
Returns the validity flag, the datetime of the content creation and the textual | ||
content. If the CRC code is invalid the function returns (False, <any datetime>, | ||
''). Otherwise, it returns (True, <content creation datetime>, <textual content>). | ||
""" | ||
|
||
# Compute the CRC code of the content that should include its own CRC code. | ||
divisor_size = _get_divisor_size(crc_divisor) | ||
bin_body = _bytes_to_bin(enc_content[:-divisor_size]) | ||
bin_cr = _bytes_to_bin(enc_content[-divisor_size:])[-len(crc_divisor):] | ||
bin_cr = compute_crc(bin_body + bin_cr, crc_divisor) | ||
|
||
# For a valid content the computed CRC should be zero. | ||
if int(bin_cr, 2) == 0: | ||
divisor_size = _get_divisor_size(crc_divisor) | ||
# Decode the content creation timestamp. | ||
ts = struct.unpack('d', enc_content[:8])[0] | ||
# Decode the content. | ||
content = enc_content[8:-divisor_size].decode('utf8') | ||
return True, datetime.fromtimestamp(ts), content | ||
return False, datetime.min, '' | ||
|
||
|
||
def _open_shared_memory(storage_name: str, max_size: int, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I now also remembered the issue with shared memory and docker. All is nice and good, if you assume the default behavior of docker regarding ipc namespace. However, there exist the option --ipc=host, in that case the shared memory of the host is used and not cleared during docker stop. We need at least a warning in the docs and the entry point process needs try to remove it when stopping. With a process this would be given and we only would need to care about clearing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And, I had an idea how we don't need to care about this. For reading we can create a session copy of the DB with a session password. |
||
must_exist: bool) -> Optional[SharedMemory]: | ||
""" | ||
Creates and returns a shared memory accessor object. If must_exist == False creates | ||
the shared memory block if it doesn't exist. Otherwise, if the block doesn't exist | ||
returns None. | ||
""" | ||
|
||
try: | ||
return SharedMemory(name=storage_name, create=False, size=max_size) | ||
except FileNotFoundError: | ||
if must_exist: | ||
return None | ||
return SharedMemory(name=storage_name, create=True, size=max_size) | ||
|
||
|
||
def write_to_sm(content: str, creation_time: Optional[datetime] = None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We currently use strings to pass the password around. This way we have no save way to cleans it the process it self. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here some solutions and packages are discussed https://stackoverflow.com/questions/728164/securely-erasing-password-in-memory-python There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might also need to disable core dumps, but I need to first check how and where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You would need to protect not just the secret store password but everything that the password itself is designed to protect - all sensitive data that goes in and out of the secret store - database credentials, AWS credentials, etc. Are we really up for that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is only about removing the password as fast as possible from the memory, sqlchipher probably did a key derivation on the password we gave it, so it won't save the original one in memory. |
||
crc_divisor: str = DEFAULT_CRC_DIVISOR, | ||
max_size: int = DEFAULT_MAX_SIZE, | ||
storage_name: str = DEFAULT_STORAGE_NAME) -> bool: | ||
""" | ||
Saves a content and its creation time in a shared memory. | ||
|
||
The named shared memory block may or may not be already allocated. The function | ||
creates or opens the block, writes the encoded content and closes the block. | ||
Currently, there are no provisions for the destruction of the block. | ||
|
||
The content gets prepended by its length in bytes, so that a reading function knows | ||
how many bytes to read. | ||
|
||
If the total length of the content doesn't fit into the maximum size of the shared | ||
memory block the function does nothing and returns False. Otherwise, if the content | ||
is successfully stored into the shared memory, it returns True. | ||
|
||
Parameters: | ||
content - The content string to be stored into the stored to the | ||
shared memory. | ||
creation_time - Time when the content was created, which will also be stored to the | ||
shared memory. | ||
If not provided the current time will be used. | ||
crc_divisor - A binary string used for computing the CRC. | ||
max_size - Maximum size of the shared memory block in bytes. | ||
storage_name - Name of the shared memory block. | ||
""" | ||
|
||
# Encode the content and check if it fits into the shared memory block | ||
creation_time = creation_time or datetime.now() | ||
cont_size, enc_content = encode(content, crc_divisor, creation_time) | ||
size_size = _get_byte_size(max_size) | ||
total_size = cont_size + size_size | ||
if total_size > max_size: | ||
return False | ||
|
||
# Open or create the named shared memory block. | ||
pwd_memory = _open_shared_memory(storage_name, max_size, False) | ||
if pwd_memory is None: | ||
return False | ||
try: | ||
# Write the content size followed by the content itself. | ||
pwd_memory.buf[:size_size] = cont_size.to_bytes(size_size, byteorder='big', | ||
signed=False) | ||
pwd_memory.buf[size_size:total_size] = enc_content | ||
finally: | ||
pwd_memory.close() | ||
return True | ||
|
||
|
||
def read_from_sm(crc_divisor: str = DEFAULT_CRC_DIVISOR, | ||
max_size: int = DEFAULT_MAX_SIZE, | ||
storage_name: str = DEFAULT_STORAGE_NAME) -> Tuple[bool, datetime, str]: | ||
""" | ||
Reads from the shared memory a content and the time when it was created . | ||
|
||
The shared memory block must already exist and hold a valid content. Like its | ||
writing counterpart, this function opens and closes the shared memory block, but | ||
doesn't destroy it afterward. | ||
|
||
The content must be prepended by its length, so that the function knows how many | ||
bytes to read. | ||
|
||
The function returns (True, <content creation datetime>, <textual content>) if a | ||
valid content has been successfully retrieved from the shared memory. If the content | ||
is empty or too big (if the size is to be believed), or it has been deemed invalid | ||
the function returns (False, <any datetime>, ''). | ||
|
||
Parameters: | ||
crc_divisor - A binary string used for computing the CRC. | ||
max_size - Maximum size of the shared memory block in bytes. | ||
storage_name - Name of the shared memory block. | ||
""" | ||
|
||
size_size = _get_byte_size(max_size) | ||
|
||
pwd_memory = _open_shared_memory(storage_name, max_size, True) | ||
if pwd_memory is None: | ||
return False, datetime.min, '' | ||
try: | ||
cont_size = int.from_bytes(pwd_memory.buf[:size_size], byteorder='big') | ||
total_size = cont_size + size_size | ||
# Check if the size makes sense | ||
if cont_size == 0 or total_size > max_size: | ||
return False, datetime.min, '' | ||
# Reade and decode the content. | ||
enc_content = bytearray(pwd_memory.buf[size_size:total_size]) | ||
return decode(enc_content, crc_divisor) | ||
finally: | ||
pwd_memory.close() | ||
|
||
|
||
def clear_sm(max_size: int = DEFAULT_MAX_SIZE, storage_name: str = DEFAULT_STORAGE_NAME, | ||
delete_storage: bool = False) -> None: | ||
""" | ||
Invalidates the content stored in shared memory by setting its length to zero and | ||
optionally destroys the shared memory block. The latter may not take an immediate | ||
effect though. | ||
|
||
Parameters: | ||
max_size - Maximum size of the shared memory block in bytes. | ||
storage_name - Name of the shared memory block. | ||
delete_storage - If True will destroy the shared memory block. | ||
""" | ||
|
||
pwd_memory = _open_shared_memory(storage_name, max_size, True) | ||
if pwd_memory is not None: | ||
try: | ||
size_size = _get_byte_size(max_size) | ||
if size_size <= max_size: | ||
cont_size = 0 | ||
pwd_memory.buf[:size_size] = cont_size.to_bytes( | ||
size_size, byteorder='big', signed=False) | ||
if delete_storage: | ||
pwd_memory.unlink() | ||
finally: | ||
pwd_memory.close() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
from datetime import datetime | ||
from unittest import mock | ||
from exasol.shared_memory_vault import (compute_crc, encode, decode, write_to_sm, | ||
read_from_sm, clear_sm) | ||
|
||
|
||
def test_compute_crc(): | ||
assert compute_crc('11010011101100000', '011') == '100' | ||
|
||
|
||
def test_encode_decode(): | ||
|
||
dt = datetime(year=2023, month=12, day=11, hour=16, minute=35, second=21) | ||
content = 'Supercalifragilisticexpialidocious' | ||
divisor = '10011010' | ||
success, enc_content = encode(content, divisor, dt) | ||
assert success | ||
success, dt_out, content_out = decode(enc_content, divisor) | ||
assert success | ||
assert dt_out == dt | ||
assert content_out == content | ||
|
||
|
||
def test_encode_corrupt_decode(): | ||
|
||
content = 'Go ahead, make my day.' | ||
divisor = '10011010' | ||
_, enc_content = encode(content, divisor, datetime.now()) | ||
enc_content[0] ^= 127 | ||
success, _, _ = decode(enc_content, divisor) | ||
assert not success | ||
|
||
|
||
@mock.patch("exasol.shared_memory_vault._open_shared_memory") | ||
def test_write_read(mock_sm_factory): | ||
|
||
max_size = 200 | ||
mock_sm = mock.MagicMock() | ||
mock_sm.buf = bytearray(max_size) | ||
mock_sm_factory.return_value = mock_sm | ||
divisor = '100110100011' | ||
content = 'The truth will set you free.' | ||
dt = datetime(year=2023, month=12, day=12, hour=8, minute=39, second=45) | ||
success = write_to_sm(content, creation_time=dt, crc_divisor=divisor, | ||
max_size=max_size) | ||
assert success | ||
success, dt_out, content_out = read_from_sm(crc_divisor=divisor, max_size=max_size) | ||
assert success | ||
assert dt_out == dt | ||
assert content_out == content | ||
|
||
|
||
@mock.patch("exasol.shared_memory_vault._open_shared_memory") | ||
def test_write_corrupt_read(mock_sm_factory): | ||
|
||
max_size = 200 | ||
mock_sm = mock.MagicMock() | ||
mock_sm.buf = bytearray(max_size) | ||
mock_sm_factory.return_value = mock_sm | ||
divisor = '100110100011' | ||
content = 'The truth will set you free.' | ||
dt = datetime(year=2023, month=12, day=12, hour=8, minute=39, second=45) | ||
write_to_sm(content, creation_time=dt, crc_divisor=divisor, max_size=max_size) | ||
mock_sm.buf = bytearray(max_size) | ||
mock_sm.buf[10] = mock_sm.buf[10] | ||
success, _, _ = read_from_sm(crc_divisor=divisor, max_size=max_size) | ||
assert not success | ||
|
||
|
||
@mock.patch("exasol.shared_memory_vault._open_shared_memory") | ||
def test_read_fail_no_sm(mock_sm_factory): | ||
|
||
# Simulate the case when the shared memory block doesn't exist. | ||
mock_sm_factory.return_value = None | ||
max_size = 200 | ||
divisor = '100110100011' | ||
success, _, _ = read_from_sm(crc_divisor=divisor, max_size=max_size) | ||
assert not success | ||
|
||
|
||
@mock.patch("exasol.shared_memory_vault._open_shared_memory") | ||
def test_write_fail_insufficient_memory(mock_sm_factory): | ||
|
||
max_size = 50 | ||
mock_sm = mock.MagicMock() | ||
mock_sm.buf = bytearray(max_size) | ||
mock_sm_factory.return_value = mock_sm | ||
divisor = '100110100011' | ||
content = 'If you want something said, ask a man; ' \ | ||
'if you want something done, ask a woman.' | ||
dt = datetime(year=2023, month=12, day=12, hour=9, minute=19, second=10) | ||
success = write_to_sm(content, creation_time=dt, crc_divisor=divisor, | ||
max_size=max_size) | ||
assert not success | ||
|
||
|
||
@mock.patch("exasol.shared_memory_vault._open_shared_memory") | ||
def test_write_clear_read(mock_sm_factory): | ||
|
||
max_size = 200 | ||
mock_sm = mock.MagicMock() | ||
mock_sm.buf = bytearray(max_size) | ||
mock_sm_factory.return_value = mock_sm | ||
divisor = '100110100011' | ||
content = 'The truth will set you free.' | ||
dt = datetime(year=2023, month=12, day=12, hour=8, minute=39, second=45) | ||
write_to_sm(content, creation_time=dt, crc_divisor=divisor, max_size=max_size) | ||
clear_sm(max_size=max_size) | ||
success, _, _ = read_from_sm(crc_divisor=divisor, max_size=max_size) | ||
assert not success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would compute this by myself. Python itself doesn't provide crc, but a hash algorithm should also be fine here. Performance isn't that critical. Or, alternative we need to import https://pypi.org/project/crc/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative, there is a crc32 in the standard lib https://docs.python.org/3/library/binascii.html?highlight=crc32#binascii.crc32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I will leave it as it is for now. The chances are it will be gone. For example, if we start using the file lock we definitely can scrap the CRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍