Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for the future #520

Open
TheDeveloper10 opened this issue Dec 21, 2024 · 5 comments
Open

Ideas for the future #520

TheDeveloper10 opened this issue Dec 21, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@TheDeveloper10
Copy link

I was looking for an s3 proxy to use for a project and then I found yours. It is the closest thing to what I needed but it still lacks the following features: encryption and replication.

Before finding your project I was thinking of implementing it on my own but yours is much closer to what I need - it has already implemented the core feature: s3 proxy and has an openid connect integration.

I just wanted to ask you if you're okay with the suggested features and if you want to collaborate on implementing them?
I'm looking forward for your response.

@TheDeveloper10 TheDeveloper10 added the enhancement New feature or request label Dec 21, 2024
@oxyno-zeta
Copy link
Owner

Hello @TheDeveloper10 ,

Of course you can submit some pull requests!

I just wanted to know:

  • what is the replication for you ?
  • what are you looking for when you are mentioning encryption?

Thanks for your issue !

Best regards,

Oxyno-zeta

@TheDeveloper10
Copy link
Author

TheDeveloper10 commented Dec 23, 2024

Hello @oxyno-zeta ,
Thank you for the quick response!
I want to say sorry about my bad English - I'm writing it in a hurry.

  • I was imagining replication of an s3 bucket the following way:
    when you insert a key it gets inserted into all buckets (or e.g. into one of them and it's queued to insert in the rest of the buckets). when it gets deleted/updated something similar happens. I need this feature because I want two s3 buckets - one in the cloud (e.g. Digital Ocean) and one on site (e.g. minio) and if I lose access to the one on site I want to access the one in the cloud. I know the case is really weird. I saw that minio has a replication solution but tbh I don't really like it. This feature is less important and definitely than the next one.
  • By encryption I mean having standard encryption algorithms and the proxy encrypts the data on write to the backend s3 and decrypts the data on read from the backend s3. I need encryption because I'm planning on hosting the s3 proxy at my homelab and I want to store large amounts of private data to the cloud. I want to the proxy to encrypt the data so I can store data from any app and have it encrypted in the same way. I imagine it like EncryptionConfiguration from kubernetes - it encrypts the data before writing it to the etcd and decrypts it on read. This, of course, can be combined with the auth - so that not everyone could decrypt every key.

I'm willing to implement both features and make a pull request to your repository. I just wanted to know if you're willing to add them to your vision of the project. I also want to know what do you think about it?

@oxyno-zeta
Copy link
Owner

Hello @TheDeveloper10 ,

Thanks for your quick answer too !
No worry for your english, mine isn't very good too :) .

  • For the replication I was afraid of what you write. The issue with what you want is the lack of "transaction". Upload will be done on 1 platform and the second will be started. What about a failure on the second and not the first one ? We cannot be sure to remove the first file. I'm afraid that this feature shouldn't be handle with this project :( . [I'm opened to discuss about this and to find the best possible solution, I'm just brainstorming with you ;) ]. What do you don't like with the minio replication solution ?
  • That's very interesting ! What do you think about a general configuration block in Config where we can reference the keys and algorithm to use, and then a block can be added with a list where we can refer the key with a path under a "target" ? Like for ressources: ressource.

Thanks your ideas and sharing them before implementing them.

@TheDeveloper10
Copy link
Author

TheDeveloper10 commented Dec 23, 2024

Hey @oxyno-zeta ,

  • For the encryption - your idea sounds interesting, I need to look at the project in a bit more detail but I think I get what you mean.
  • For the replication - I thought of the same issue regarding the lack of transaction. Here's an idea for a solution:
    We can add a state (e.g. a database - postgres so that the application can scale) for each entry.
    On write to key "/mykey/..."- we upload the contents to the first bucket from the list of buckets (or to the first one that succeeds to persist the file) then we write several entries in the table below:
  Key | Bucket | StartedReplicating | FinishedReplicating
  /registry/myentry | target-1 | <some-unix-timestamp> | <some-unix-timestamp>
  /registry/myentry | target-2 | null | null
  /registry/myentry | target-3 | null | null

After writing the entries, we return a response to the user.
Meanwhile, on the background, there's a goroutine that syncs replicas from this table (if the StartedReplicating is null or it's more than some threshold, e.g. 5min, then the entry is not being actively replicated so it can be selected for replication again).

Basically we use the first target (the first bucket in the list of buckets for replicas) as a pivot - writes, updates and gets happen from there. Then there's a syncing service (goroutine) that syncs with the other tables. If the first target fails for some reason we use the table as a reference to get where the latest version of a given key is.

An example configuration (that's just a mockup - haven't checked with the existing configuration):

targets:
- name: digitalocean
  s3-url: ...
  s3-api-key: ...
  encryption:
  - aesgcm: ... # first layer of encryption
  - aescbc: ... # secon layer of encryption
- name: minio
  s3-url: ...
  s3-api-key: ...
  encryption:
  - plain: {} # no encryption

rules:
- bucket_name: "memories"
  targets:
  - name: digitalocean # main target (first writes happen here)
  - name: minio
- bucket_name: "*" # if no other rule can satisfy the incoming request (fallback)
   targets:
   - name: "digitalocean"

Please let me know what you think of this setup.

@oxyno-zeta
Copy link
Owner

Hello @TheDeveloper10 ,

For replication:

I just found an old project that can help you for replication: https://github.com/scality/cloudserver

This will do the replication part on multiple backends.
Adding database on this project isn't in my mind for the moment.
What do you think about scality zenko project?


For encryption:

What do you think about something like this for the configuration?

encryption:
  enc1:
     algorithm: XXX
     publicKey: # Secret
        path: YYY
     ....

targets:
  t1:
    bucket:
      name: JJJJ
    encryptionRules:
       - path: /key1/key2/.* # S3 Key path 
         encryption: enc1 # Encryption key from map

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants