Thank you for you interest in contributing to the S3 metadata tagger! To ensure a frictionless entry into our project, please have a quick look at this document before participating in its development:
s3-metadata-tagger is a library to add metadata to objects stored in s3.
Any larger code contribution probably falls into one of the two following categories:
- Improvements to object_tagger.py. It is responsible for downloading files and invoking metadata extractors. While the execution flow is largely finished, it is always open for improvements and refinements.
- Additions of new metadata extractors. Those sub-packages extract meta-information from files (like pages in a pdf). Any addition of extractors is always welcome, but if its dimensions grow above a certain size, consider moving it to a seperate repo and deploying it as a library which interfaces to this package (An example for this would be if the extractor depended on certain software being installed at the host)
To avoid deployment/quality issues, we are employing a wide array of tools:
To ensure everyone is on the same python version, we recommend using pyenv, which reads the
local .python-version file and locally provides the configured version.
We furthermore recommend to create a local virtual environment to encapsulate all installed packages to the s3-metadata-tagger
project directory.
Please do not change the python version
without previous communication with the projects maintainers.
Testing is currently done with the unittest framework. If more complicated tests are needed in your eyes, feel free to recommend an additional framework.
We are using pylint for code linting.
For packaging, setuptools is used. Since deployment is done via GitHub actions (see deploy.yml), you do not need to install the tool locally. If a new version of the package is released, it has to be adjusted in pyproject.toml
To avoid including vulnerable packages, both trivy and OWASP Dependency-Check and executed on every commit to an open pull request. They are not really cut out for our use case, but should serve so far. If you have any better suggestions, feel free to open an issue for it!
Naturally, we use git to do version control and to manage source code extensions.
When developing, please create a new branch from the staging
branch.
The branch name starts with the class of the branch, followed by a
slash and a one- or two-worded identifier for the branch.
The classes are as follows:
Prefix | Description | Example |
---|---|---|
feature/ | A new feature is being developed. | Parametrizable placeholders |
fix/ | A bug is fixed. | Null Pointer Exception when resolving the same placeholder two times |
refactor/ | We change some existing implementation for stability/performance reasons. | Use Enum-based switch-case for resolving in the ReflectionResolver |
polishing/ | Collection branch of smaller refactoring. | Switch to instanceof pattern matching |
cicd/ | Changes are done to the CI/CD pipeline. | Add testing coverage to sonarqube |
doc/ | Documentation is added or changed . | Clarify examples |
Please try to group commits so that each commit bundles a set of
joint code changes.
For example, assemble one commit Add infrastructure
and one
Add tests
The commit message should follow the guidelines described in
How to write a commit message
by Chris Beams.
We use GitHub to manage the source code, keep track of issues, and handle pull requests.
If your contribution requires some discussion it might be useful
to create an issue first, describing the problem you are facing.
If you already have some path of action in mind you can also open a
WIP:
pull request directly.
Pull requests should be done to the staging
branch.
When opening a pull request, please give it a meaningful name and
description.
Either link to the corresponding issue or describe the problem in the
pull request if it is of a size not warranting opening an issue.
Furthermore, please add the applicable tags and request a review by
a maintainer.
Each PR needs to confirm to the linting and sonarqube requirements, and should add a reasonable set of tests for any added code.
Please also add appropriate logging, so that issues further down the roead can be triaged faster.
If any issues arise with trivy or owasp, please check back with the project maintainers.
The S3 metadata tagger follows the Semantic Versioning versioning scheme.
In short, the version is given as x.y.z
- Bug fixes, which do not change anything about the interface and introduce no new features increase
z
- Versions containing new features, which do not influence the existing interfaces increase the
y
- Releases introducing breaking changes increase the
z
.
There is no fixed release schedule, new versions are released whenever a contained set of changes can be deployed.
Each new version is first deployed on the staging
branch, which releases a new package version to
test.pypi.org.
After successful testing, thestaging
branch is merged into the production
branch, which pushes the package to
pypi.org.
The releases to test.pypi.org and pypi.org are done via github actions (see deploy.yml).
Dependabot is enabled for this repository to keep the github actions and the python dependencies up-to-date.
- The projects README.md
- The projects ARCHITECTURE.md
- pyenv
- The python packaging guide
- The setuptools guide