-
-
Notifications
You must be signed in to change notification settings - Fork 905
GSoC 2024
Welcome to the S3cmd Google Summer of Code 2024 projects page.
We are quite open and don't require a lot of formalities for you to apply for a GSoC project with us.
In the following, you will find more info to help you determine if we could be a good fit for each other.
- A good knowledge of Python
- A basic knowledge of Git and Github
- An understanding of what is an API and how to interact with a server
- Being comfortable with the usage of command line tools
- Be curious
Having some experience dealing with various versions of Python running on multiple OS (Linux, Mac, Windows) would be a great plus.
A previous experience with S3cmd, S3, Object Storage, or cloud services is NOT REQUIRED to apply but would be appreciated.
It is usually fun and easy to understand when you are new to the subject.
Note:
Despite being a client for "Object Storage services", you could expect to be able to develop and test S3cmd with no or very low cost:
- s3cmd is entirely based on Python with few basic dependencies and doesn't compilation, so very little "computer resources" are needed.
- Small S3 compatible server can easily run in local.
- Cloud object storage services usually offer comfortable "free tiers".
You can find a list of suggested project ideas in the following of this page ([link](# Idea List)) but we also encourage candidates to come up with their own project idea.
How to apply:
- Try to understand the project and eventually give a try to s3cmd
- Read the GSoC timeline, contributor responsibilities to ensure your eligibility
- (Recommended) Open a new issue here with the "[GSoC2024]" tag in title to present yourself
- Who are you?
- What is your background?
- In which country are you located? Which Timezone?
- What is your motivation to become a contributor for the S3cmd organization?
- Which projects are you interested in and why?
- What is your projected availability during the program to complete the project?
- Submit your application to the Google system before the deadline on April 2 (18:00 UTC). All applications must go through Google's application system; we can't accept any application unless it is submitted there.
Feel free to send an email to florent AT sodria.com if you want to exchange privately, to ask questions or discuss of a possible application.
To be able to synchronize local and remote files, we have to compare the file "hash" from both sides.
This requires us to do an expensive "recalculation" of the "hash" of local files at each run.
Performance can be improved a lot by using a cache of local files to avoid this recalculation.
Currently, s3cmd has a "cache" feature but the implementation is very inefficient.
It is a single raw text file based on a "pickle" marshaling of the file list in memory.
We could improve considerably the performance, reliability and memory usage of s3cmd by using a single file local database to store this cache information.
In addition, there is a limitation of the s3 protocol regarding big files (ie multipart files) that prevents us to be able to retrieve the "hash" of the remote file for such a file.
If the new cache system could record some info about the remote side, the performance could be boosted even more.
For local single file database, we could use for example one of: Sqlite3, MDB, LMDB
Originally, s3cmd was developed to interact with the Amazon S3 service.
Little by little, a lot of other cloud services appeared that were offering an S3-Compatible interface.
In addition, a lot of OSS or proprietary self hostable servers were also created with S3-Compatible interfaces.
Sadly, so far, s3cmd stayed a "one size" fit all application, with the lowest common denominator in term of API usage for all servers and services.
For example, we expect all services to use "MD5" for "file hash" calculation.
Other differences might be different API or API versions for some endpoints for different services.
The purpose of this project is to offer the users with a new configuration option allowing to select a service "profile".
Each profile will have a predetermined set a dynamic behavior "feature flags" configuration preset.
For example, you could have a profile for "aws", "gcs", "digitalocean", "scaleway", "ibmcos", "minio", "radosgw", ...
It would be nice to have the proper scripts for Bash and ZSH, to have command line auto-completion for s3cmd. It should auto-complete commands but also be able to retrieve "remote" path suggestions when possible. This project would probably require more "shell" skills than "Python" skills. Related: https://github.com/s3tools/s3cmd/issues/985 , https://github.com/s3tools/s3cmd/issues/1092
S3 bucket can have a "versioning" option enabled to preserve previous versions of files after each modification.
But, currently, we don't provide any way to access or manipulate previous version of files.
The purpose of this project would be to add support for versioning for most of the commands where it would make sense.
Related: https://github.com/s3tools/s3cmd/issues/341
s3cmd supports a huge number of commands and flags from the command line.
After adding so many features, the help is really crowded and it might be hard for a new user to understand how to use a command, or which flag might be relevant.
The purpose of this project would be to rework the parser, re-organise commands, and maybe group them, in order to be able to provide a proper "help" per command that will not crowded with useless flags.
Related: https://github.com/s3tools/s3cmd/issues/1035
To be completed ... more project ideas to be added
Additionally, you can have a look at the opened issues with "feature-request" labels to find alternative idea of projects: https://github.com/s3tools/s3cmd/issues?q=is%3Aopen+is%3Aissue+label%3Afeature-request