Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable storage provider #1937

Closed
sainak opened this issue Mar 2, 2024 · 8 comments
Closed

Configurable storage provider #1937

sainak opened this issue Mar 2, 2024 · 8 comments

Comments

@sainak
Copy link
Member

sainak commented Mar 2, 2024

Currently, we only support s3 compatible storage providers for storing and retrieving files, we need to make the storage provider more configurable

we have 2 buckets currently:

  • for serving facility cover images (public)
  • for serving sensitive patient files (private)

The solution is to create a configurable storage provider class that can support s3/azure blob storage/local storage to serve files.

requirements:

  • there should be a strong access control to prevent public access to patient/internal files
  • we should be able to serve static files and cover images thought the public bucket
  • storage providers should expose signed upload URLs for patient files
  • storage providers should expose patient files only with signed URLs that can expire after a certain amount of time
  • have proper checks for file mime-types
  • add proper test cases to handle the above requirements

Share your implementation plan before working on this issue

@sainak sainak added good first issue Good for newcomers P2 labels Mar 2, 2024
@ab1123
Copy link

ab1123 commented Mar 2, 2024

Can I work on this issue??

@balaji-sivasakthi
Copy link

@sainak Will you provide test secrets and credentials to work on this issue?

@sainak
Copy link
Member Author

sainak commented Mar 3, 2024

@balaji-sivasakthi you will have to use local development tools like localstack and azureite

@balaji-sivasakthi
Copy link

balaji-sivasakthi commented Mar 3, 2024

@sainak Is these tools FOSS?

@sainak
Copy link
Member Author

sainak commented Mar 4, 2024

@sainak Is these tools FOSS?

For our use case, yes

@rash-27
Copy link
Contributor

rash-27 commented Mar 4, 2024

  • for this already the security checks are made in the serializers
  • So , my idea is to create a class and use the instance of it where presently the s3 bucket is used (i.e the methods of the models that return URL and etc, in case of file storage) so that multiple storage providers can be accessed via it,
  • the basic idea of the class :
    class StorageProvider:
    def init(self , provider ,bucket_type ):
    if provider == "s3":
    self.client = boto3.client("s3", **config)
    //similarly we can provide the options for remaining storage providers
    // Here implementing the methods required , like signed_url , put_object , get_object , etc ,
  • regarding private and public buckets is we can decide with the help of bucket type and we can change the upload url accordingly
    @sainak This is the basic idea of my implementation, and mention if I am wrong somewhere or suggest any changes need to done

@balaji-sivasakthi
Copy link

balaji-sivasakthi commented Mar 5, 2024

Approach

To decouple the storage provider from specific use cases like patient and facility, we can introduce more flexibility by allowing the specification of bucket types and access controls through parameters. We'll utilize enums for access control and provide the bucket name as a parameter.

class ClientConfig(TypedDict):
    region_name: str
    aws_access_key_id: str
    aws_secret_access_key: str
    endpoint_url: str

class CSProvider(enum.Enum):
    AWS = "AWS"
    GCP = "GCP"
    AZURE = 'AZURE'
    DOCKER = "DOCKER"  # localstack in docker
    LOCAL = "LOCAL"  # localstack on host

class AccessControl(enum.Enum):
    PUBLIC = "PUBLIC"
    PRIVATE = "PRIVATE"

class BucketType(enum.Enum):
    PATIENT = "PATIENT"
    FACILITY = "FACILITY"

Then we will create a factory method for storages based on the CSProvider enum, and also initialise the clientConfig and bucket name for getting the signed url

I will add a class method named from_provider inside the ConfigurableStorageProvider class. This method acts as a factory method to create instances of ConfigurableStorageProvider based on the specified CSProvider. When calling the from_provider method, you provide the desired CSProvider, account_url, and credential. It internally creates a ClientConfig object and initialises the ConfigurableStorageProvider instance accordingly. This approach makes it easier to instantiate the ConfigurableStorageProvider class by encapsulating the logic of creating the ClientConfig object based on the specified provider

LLD

def __init__(self, provider: CSProvider, client_config: ClientConfig):
def from_provider(cls, provider: CSProvider, account_url: str, credential: str) -> 'ConfigurableStorageProvider':
def get_signed_url(self, bucket_name: BucketName, object_key: str, expiration: int, access_control: AccessControl) -> str:
def check_mime_type(self, file_path: str, expected_mime_type: str) -> bool:

cc @sainak @rash-27

@amjithtitus09
Copy link
Contributor

@sainak Please share your feedback for the below approach

Implementation Plan

To make the storage provider configuration more flexible and support multiple types (AWS S3, Azure Blob Storage, Local Storage), this would be my approach:

1. Define Abstract Base Class for Storage Providers

Create an abstract base class BaseStorageProvider that will define the common interface for all storage providers, ensuring consistency.

2. Implement Concrete Storage Provider Classes

Implement concrete storage provider classes for AWS S3, Azure Blob Storage, and Local Storage, each inheriting from the BaseStorageProvider class.

3. Centralize Configuration Management

Update the config.py to handle configuration for all supported storage providers and bucket types (e.g., Patient, Facility). This will include settings for AWS S3, Azure Blob Storage, and Local Storage.

4. Create Utility Functions for Storage Provider Initialization

Create utility functions in storage_utils.py to initialize and return the appropriate storage provider based on the configuration.

5. Integrate MIME Type Checks

Centralize MIME type checks in a utility module mime_type_utils.py to avoid duplication and ensure consistency across different storage providers.

base_storage_provider.py

class BaseStorageProvider(ABC):

    @abstractmethod
    def get_signed_upload_url(self, filename: str, mime_type: str) -> str:
        pass

    @abstractmethod
    def get_signed_download_url(self, filename: str) -> str:
        pass

    @abstractmethod
    def upload_file(self, file: bytes, filename: str, mime_type: str):
        pass

    @abstractmethod
    def download_file(self, filename: str) -> bytes:
        pass

config.py

...
def get_facility_bucket_config(external: bool) -> tuple[ClientConfig, BucketName]:
    if settings.CS_PROVIDER == CSProvider.AWS:
        return {
            "region_name": settings.FACILITY_S3_REGION,
            "aws_access_key_id": settings.FACILITY_S3_KEY,
            "aws_secret_access_key": settings.FACILITY_S3_SECRET,
            "endpoint_url": settings.FACILITY_S3_BUCKET_EXTERNAL_ENDPOINT if external else settings.FACILITY_S3_BUCKET_ENDPOINT,
        }, settings.FACILITY_S3_BUCKET
    elif settings.CS_PROVIDER == CSProvider.AZURE:
        return {
            "connection_string": settings.FACILITY_AZURE_CONNECTION_STRING,
        }, settings.FACILITY_AZURE_CONTAINER
    elif settings.CS_PROVIDER == CSProvider.LOCAL:
        return {}, settings.FACILITY_LOCAL_DIR
    raise ValueError("Unsupported Cloud Service Provider")
...

storage_utils.py

def get_storage_provider(bucket_type=BucketType.PATIENT):
    config, bucket_name = get_client_config(bucket_type, True)
    if settings.CS_PROVIDER == CSProvider.AWS:
        return S3StorageProvider(bucket_name, config)
    elif settings.CS_PROVIDER == CSProvider.AZURE:
        return AzureBlobStorageProvider(bucket_name, config)
    elif settings.CS_PROVIDER == CSProvider.LOCAL:
        return LocalStorageProvider(bucket_name, config)
    else:
        raise ValueError("Unsupported storage provider")

mime_type_utils.py

def is_allowed_mime_type(mime_type: str) -> bool:
    return mime_type in settings.ALLOWED_MIME_TYPES

def check_mime_type(mime_type: str):
    if not is_allowed_mime_type(mime_type):
        raise ValueError(f"MIME type {mime_type} not allowed")

@amjithtitus09 amjithtitus09 removed their assignment Aug 5, 2024
@vigneshhari vigneshhari closed this as not planned Won't fix, can't repro, duplicate, stale Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants