Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream encrypt using functor based chunk_storage #396

Merged
merged 41 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
b2292b1
feat(storage): add recursive data map retrieval for decryption
dirvine Nov 15, 2024
8bfc193
refactor(decrypt): rename and restrict visibility of decrypt function
dirvine Nov 15, 2024
756b07c
refactor(api): remove unused seek functionality
dirvine Nov 15, 2024
a497b7c
refactor: move utility functions from lib.rs to utils.rs
dirvine Nov 15, 2024
d9fb03d
refactor: stream encryption code, fix tests, and improve error handling
dirvine Nov 16, 2024
642e58f
refactor: move stream out of use for now
dirvine Nov 16, 2024
2720c5a
fix: improve chunk handling and add comprehensive tests
dirvine Nov 17, 2024
63b3d44
fix: improve Python bindings and add comprehensive tests
dirvine Nov 17, 2024
e0077ea
feat(python): Enhance Python Bindings, Add Streaming Decrypt, and Imp…
dirvine Nov 17, 2024
080cc24
feat: update breaking change
dirvine Nov 17, 2024
a921482
feat(verify): Add Chunk Verification and Clean Up Python Bindings
dirvine Nov 17, 2024
2ba9389
fix: update benchmarks to use public API
dirvine Nov 18, 2024
f3882c1
fix: clippy
dirvine Nov 18, 2024
cbaa0fd
feat(docs): Add comprehensive Python documentation and docstrings
dirvine Nov 18, 2024
1994b1a
feat(cli): Add command-line interface for self-encryption
dirvine Nov 19, 2024
87f5593
feat: patch bump
dirvine Nov 19, 2024
b0ca5e5
feat(docs): enhance Python bindings documentation and CLI
dirvine Nov 19, 2024
4034ef3
feat: inital commit of functor based encrypt
dirvine Nov 28, 2024
402f5fd
refactor(encrypt): improve streaming encryption implementation
dirvine Nov 28, 2024
1714681
feat(python): Add streaming encryption and update documentation
dirvine Nov 28, 2024
f98e966
fix: bindings
dirvine Nov 29, 2024
cd9e7fe
fix: bindings
dirvine Nov 29, 2024
23c2b5c
fix: bindings
dirvine Nov 29, 2024
dcaf44c
fix: bindings
dirvine Nov 29, 2024
bd97c4c
fix: bindings
dirvine Nov 29, 2024
abf811d
fix: bindings
dirvine Nov 29, 2024
b30204e
fix: bindings
dirvine Nov 29, 2024
6b69b97
fix: bindings
dirvine Nov 29, 2024
19561f2
fix: bindings
dirvine Nov 29, 2024
12f527d
fix: bind
dirvine Nov 29, 2024
fc21dc9
fix: bindings
dirvine Nov 29, 2024
a708611
fix: bindings
dirvine Nov 29, 2024
e18931e
fix: bindings
dirvine Nov 29, 2024
d372e15
fix: bindings
dirvine Nov 29, 2024
90f2b05
refactor: improve streaming encryption memory usage
dirvine Nov 29, 2024
5794755
test: add encryption algorithm consistency test
dirvine Nov 29, 2024
709713c
feat: add command line arguments to parallel decryptor example
dirvine Nov 29, 2024
462a3c6
fix: clippy and fmt
dirvine Dec 10, 2024
ae07b68
refactor: rename encryption module to aes
dirvine Dec 10, 2024
389b305
refactor(streaming_encrypt): Implement single-pass streaming encryption
dirvine Dec 10, 2024
3466715
Revert "refactor: move stream out of use for now"
dirvine Dec 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
feat(verify): Add Chunk Verification and Clean Up Python Bindings
This commit adds chunk verification functionality and improves the Python bindings:

Features:
- Add verify_chunk function to validate chunk content against expected hash
- Add comprehensive tests for chunk verification
- Clean up Python bindings naming and structure

Python Binding Changes:
- Remove 'py_' prefix from all Python-exposed functions for cleaner API
- Fix module structure to properly expose functionality
- Add chunk verification to Python API
- Update Python tests to cover all functionality including verification
- Fix module naming and import issues

Testing:
- Add test_verify_chunk to Python test suite
- Add Rust test for verify_chunk functionality
- Enhance comprehensive test coverage
- Add cross-platform compatibility tests

Documentation:
- Update README with chunk verification examples in both Rust and Python
- Clean up Python API documentation
- Add type hints and docstrings to Python bindings

Build System:
- Fix pyproject.toml configuration
- Update module naming in setup.py
- Fix Python package structure
- Add proper dependencies

The changes provide a more robust way to verify chunk integrity and improve the overall usability of the Python bindings.
  • Loading branch information
dirvine committed Nov 17, 2024
commit a921482449cdf63312e8f512d290bdfebe461926
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,7 @@ target/
*.egg
*.whl
*.so
*pyc*



103 changes: 103 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -546,3 +546,106 @@ def advanced_example():
- Error handling follows Python conventions with descriptive exceptions
- Supports both synchronous and parallel chunk processing
- Memory efficient through streaming operations

### Chunk Verification

#### Rust
```rust
use self_encryption::{verify_chunk, EncryptedChunk, XorName};

// Verify a chunk matches its expected hash
fn verify_example() -> Result<()> {
let chunk_hash = XorName([0; 32]); // 32-byte hash
let chunk_content = vec![1, 2, 3]; // Raw chunk content

match verify_chunk(chunk_hash, &chunk_content) {
Ok(chunk) => println!("Chunk verified successfully"),
Err(e) => println!("Chunk verification failed: {}", e),
}
Ok(())
}
```

The `verify_chunk` function provides a way to verify chunk integrity:
- Takes a `XorName` hash and chunk content as bytes
- Verifies the content matches the hash
- Returns a valid `EncryptedChunk` if verification succeeds
- Returns an error if verification fails

#### Python
```python
from self_encryption import verify_chunk

def verify_example():
# Get a chunk and its expected hash from somewhere
chunk_hash = bytes.fromhex("0" * 64) # 32-byte hash as hex
chunk_content = b"..." # Raw chunk content

try:
# Verify and get a usable chunk
verified_chunk = verify_chunk(chunk_hash, chunk_content)
print("Chunk verified successfully")
except ValueError as e:
print(f"Chunk verification failed: {e}")
```

The Python `verify_chunk` function provides similar functionality:
- Takes a 32-byte hash (as bytes) and the chunk content
- Verifies the content matches the hash
- Returns a valid EncryptedChunk if verification succeeds
- Raises ValueError if verification fails

This functionality is particularly useful for:
- Verifying chunk integrity after network transfer
- Validating chunks in storage systems
- Debugging chunk corruption issues
- Implementing chunk validation in client applications

### XorName Operations

The `XorName` class provides functionality for working with cryptographic names and hashes:

```python
from self_encryption import XorName

# Create a XorName from content
content = b"Hello, World!"
name = XorName.from_content(content)
print(f"Content hash: {''.join(format(b, '02x') for b in name.as_bytes())}")

# Create a XorName directly from bytes (must be 32 bytes)
hash_bytes = bytes([x % 256 for x in range(32)]) # Example 32-byte array
name = XorName(hash_bytes)

# Get the underlying bytes
raw_bytes = name.as_bytes()

# Common use cases:
# 1. Verify chunk content matches its hash
def verify_chunk_example():
# Get a chunk and its expected hash
chunk_content = b"..." # Raw chunk content
expected_hash = XorName.from_content(chunk_content)

# Verify the chunk
verified_chunk = verify_chunk(expected_hash, chunk_content)
print("Chunk verified successfully")

# 2. Track chunks by their content hash
def track_chunks_example():
chunks = {} # Dict to store chunks by hash

# Store a chunk
content = b"Some chunk content"
chunk_hash = XorName.from_content(content)
chunks[chunk_hash.as_bytes().hex()] = content

# Retrieve a chunk
retrieved = chunks.get(chunk_hash.as_bytes().hex())
```

The `XorName` class provides:
- `from_content(bytes) -> XorName`: Creates a XorName by hashing the provided content
- `__init__(bytes) -> XorName`: Creates a XorName from an existing 32-byte hash
- `as_bytes() -> bytes`: Returns the underlying 32-byte array
- Used for chunk verification and tracking in the self-encryption process
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ dependencies = [
[tool.maturin]
features = ["python"]
module-name = "self_encryption._self_encryption"
python-source = "self_encryption"
bindings = "pyo3"
develop = true

[tool.pytest.ini_options]
testpaths = ["tests"]
Expand Down
4 changes: 4 additions & 0 deletions self_encryption/__init__.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,25 @@
from ._self_encryption import (
DataMap,
EncryptedChunk,
XorName,
encrypt,
encrypt_from_file,
decrypt,
decrypt_from_storage,
shrink_data_map,
streaming_decrypt_from_storage,
verify_chunk,
)

__all__ = [
"DataMap",
"EncryptedChunk",
"XorName",
"encrypt",
"encrypt_from_file",
"decrypt",
"decrypt_from_storage",
"shrink_data_map",
"streaming_decrypt_from_storage",
"verify_chunk",
]
21 changes: 0 additions & 21 deletions self_encryption/self_encryption/__init__.py

This file was deleted.

34 changes: 33 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ mod utils;

pub use decrypt::decrypt_chunk;
use utils::*;
pub use xor_name::XorName;


pub use self::{
data_map::{ChunkInfo, DataMap},
Expand All @@ -115,7 +117,6 @@ use std::{
io::{Read, Write},
path::Path,
};
use xor_name::XorName;

// export these because they are used in our public API.
pub use bytes;
Expand Down Expand Up @@ -615,6 +616,37 @@ pub fn deserialize<T: serde::de::DeserializeOwned>(bytes: &[u8]) -> Result<T> {
.map_err(|e| Error::Generic(format!("Deserialization error: {}", e)))
}

/// Verifies and deserializes a chunk by checking its content hash matches the provided name.
///
/// # Arguments
///
/// * `name` - The expected XorName hash of the chunk content
/// * `bytes` - The serialized chunk content to verify
///
/// # Returns
///
/// * `Result<EncryptedChunk>` - The deserialized chunk if verification succeeds
/// * `Error` - If the content hash doesn't match or deserialization fails
pub fn verify_chunk(name: XorName, bytes: &[u8]) -> Result<EncryptedChunk> {
// Create an EncryptedChunk from the bytes
let chunk = EncryptedChunk {
content: Bytes::from(bytes.to_vec()),
};

// Calculate the hash of the encrypted content directly
let calculated_hash = XorName::from_content(chunk.content.as_ref());

// Verify the hash matches
if calculated_hash != name {
return Err(Error::Generic(format!(
"Chunk content hash mismatch. Expected: {:?}, Got: {:?}",
name, calculated_hash
)));
}

Ok(chunk)
}

#[cfg(test)]
mod data_map_tests {
use super::*;
Expand Down
37 changes: 37 additions & 0 deletions src/python.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ struct PyEncryptedChunk {
inner: RustEncryptedChunk,
}

#[pyclass(name = "XorName")]
#[derive(Clone)]
struct PyXorName {
inner: XorName,
}

#[pymethods]
impl PyDataMap {
#[new]
Expand Down Expand Up @@ -108,6 +114,27 @@ impl PyEncryptedChunk {
}
}

#[pymethods]
impl PyXorName {
#[new]
fn new(bytes: &PyBytes) -> Self {
Self {
inner: XorName::from_content(bytes.as_bytes()),
}
}

#[staticmethod]
fn from_content(content: &PyBytes) -> Self {
Self {
inner: XorName::from_content(content.as_bytes()),
}
}

fn as_bytes(&self) -> Vec<u8> {
self.inner.0.to_vec()
}
}

#[pyfunction]
fn encrypt(_py: Python<'_>, data: &PyBytes) -> PyResult<(PyDataMap, Vec<PyEncryptedChunk>)> {
let bytes = Bytes::from(data.as_bytes().to_vec());
Expand Down Expand Up @@ -211,15 +238,25 @@ fn streaming_decrypt_from_storage(
.map_err(|e| PyErr::new::<pyo3::exceptions::PyValueError, _>(e.to_string()))
}

#[pyfunction]
fn verify_chunk(name: &PyXorName, content: &PyBytes) -> PyResult<PyEncryptedChunk> {
match crate::verify_chunk(name.inner, content.as_bytes()) {
Ok(chunk) => Ok(PyEncryptedChunk { inner: chunk }),
Err(e) => Err(PyErr::new::<pyo3::exceptions::PyValueError, _>(e.to_string())),
}
}

#[pymodule]
fn _self_encryption(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
m.add_class::<PyDataMap>()?;
m.add_class::<PyEncryptedChunk>()?;
m.add_class::<PyXorName>()?;
m.add_function(wrap_pyfunction!(encrypt, m)?)?;
m.add_function(wrap_pyfunction!(encrypt_from_file, m)?)?;
m.add_function(wrap_pyfunction!(decrypt, m)?)?;
m.add_function(wrap_pyfunction!(decrypt_from_storage, m)?)?;
m.add_function(wrap_pyfunction!(shrink_data_map, m)?)?;
m.add_function(wrap_pyfunction!(streaming_decrypt_from_storage, m)?)?;
m.add_function(wrap_pyfunction!(verify_chunk, m)?)?;
Ok(())
}
64 changes: 63 additions & 1 deletion tests/integration_tests.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use bytes::Bytes;
use self_encryption::{
decrypt, decrypt_from_storage, encrypt, encrypt_from_file, get_root_data_map, shrink_data_map,
streaming_decrypt_from_storage, test_helpers::random_bytes, DataMap, EncryptedChunk, Error, Result,
streaming_decrypt_from_storage, test_helpers::random_bytes, verify_chunk, DataMap, EncryptedChunk, Error, Result,
};
use std::{
collections::HashMap,
Expand Down Expand Up @@ -651,3 +651,65 @@ fn test_streaming_decrypt_with_parallel_retrieval() -> Result<()> {

Ok(())
}

#[test]
fn test_chunk_verification() -> Result<()> {
let storage = StorageBackend::new()?;
let temp_dir = TempDir::new()?;

// Create test data and encrypt it
let test_size = 5 * 1024 * 1024; // 5MB
let data = random_bytes(test_size);
let input_path = temp_dir.path().join("input.dat");
File::create(&input_path)?.write_all(&data)?;

// Encrypt file to get some chunks
let (data_map, _) = encrypt_from_file(&input_path, storage.disk_dir.path())?;

// Get the first chunk info and content
let first_chunk_info = &data_map.infos()[0];
let chunk_path = storage.disk_dir.path().join(hex::encode(first_chunk_info.dst_hash));
let mut chunk_content = Vec::new();
File::open(&chunk_path)?.read_to_end(&mut chunk_content)?;

// Test 1: Verify valid chunk
let verified_chunk = verify_chunk(first_chunk_info.dst_hash, &chunk_content)?;
assert_eq!(
verified_chunk.content, chunk_content,
"Verified chunk content should match original"
);

// Test 2: Try with wrong hash
let mut wrong_hash = first_chunk_info.dst_hash.0;
wrong_hash[0] ^= 1; // Flip one bit
let wrong_name = XorName(wrong_hash);
assert!(
verify_chunk(wrong_name, &chunk_content).is_err(),
"Should fail with incorrect hash"
);

// Test 3: Try with corrupted content
let mut corrupted_content = chunk_content.clone();
if !corrupted_content.is_empty() {
corrupted_content[0] ^= 1; // Flip one bit
}
assert!(
verify_chunk(first_chunk_info.dst_hash, &corrupted_content).is_err(),
"Should fail with corrupted content"
);

// Test 4: Verify all chunks from encryption
println!("\nVerifying all chunks from encryption:");
for (i, info) in data_map.infos().iter().enumerate() {
let chunk_path = storage.disk_dir.path().join(hex::encode(info.dst_hash));
let mut chunk_content = Vec::new();
File::open(&chunk_path)?.read_to_end(&mut chunk_content)?;

match verify_chunk(info.dst_hash, &chunk_content) {
Ok(_) => println!("✓ Chunk {} verified successfully", i),
Err(e) => println!("✗ Chunk {} verification failed: {}", i, e),
}
}

Ok(())
}
Loading