Skip to content

Commit

Permalink
[target-core] Move the core features and functions in common shared `…
Browse files Browse the repository at this point in the history
…target-core` package (#35)

* [target-core] Move the core features and functions onto a dedicated `target-core` package

* [target-core] Move the core features and functions onto a dedicated `target-core` package: lint

* [target-core] test moto[s3]

* [target-core] Development Status :: 2 - Pre-Alpha

* [github actions] install target-core from gitlab sources

* [github actions] install target-core from gitlab sources: v0.0.1

* target-core release  0.0.2 bump

* Drop 3.8 support: cannot import name 'to_thread' from 'asyncio' in Python 3.8

* Drop 3.8 support: cannot import name 'to_thread' from 'asyncio' in Python 3.8 2

* [target-core] S3 module

* [target-core] S3 module: bump

* [target-core] S3 module: mypy static typing

* [target-core] S3 module: mypy static typing test

* [target-core] S3 module: mypy static test compliance

* [target-core] S3 module: code moved in src source folder

* [target-core] S3 module: workflow cache diabled

* [target-core] S3 module: workflow cache restored

* [target-core] S3 module: workflow cache disabled

* [target-core] S3 module: cov update

* [target-core] S3 module: tox test

* [target-core] S3 module: tox typing check

* [target-core] target_s3_jsonl renamed as target_s3_json

* [target-core] target_s3_jsonl renamed as target_s3_json: new entry point

* [target-core] target-core bump 0.0.5 -> 0.0.6

* [target-core] gh-action-pypi-publish default branch update to release/v1

* [target-core] Leverage native Loader & improve specs

* [target-core] Concurrent files uploads

* [target-core] Concurrent files uploads comments

* [target-core] Concurrent files uploads test

* [target-core] changelog.md link

* [target-core] Concurrent files removal

* [target-core] Concurrent files removal 2

* [target-core] Concurrent files comments

* [target-core] Asynchronous file upload

* [target-core] Asynchronous file upload :simplified main

* [target-core] Asynchronous file upload: remove concurrency_max

* [target-core] Asynchronous file upload: CHANGELOG update
  • Loading branch information
ome9ax authored Oct 1, 2022
1 parent e21a866 commit 7a2a9d8
Show file tree
Hide file tree
Showing 30 changed files with 938 additions and 1,165 deletions.
5 changes: 4 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@ updates:
- package-ecosystem: pip
directory: "/"
schedule:
interval: weekly
interval: monthly
day: monday
timezone: Europe/London
allow:
# Allow only dependencies in the "Production dependency group"
- dependency-type: production
reviewers:
- ome9ax
open-pull-requests-limit: 9
33 changes: 23 additions & 10 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
matrix:
project: ['target-s3-jsonl']
os: [ubuntu-latest] #, macos-latest, windows-latest
python-version: [3.8, 3.9, '3.10']
python-version: [3.9, '3.10']
exclude:
- os: macos-latest
python-version: 3.9
Expand Down Expand Up @@ -68,12 +68,28 @@ jobs:
python -m venv venv || virtualenv venv
. venv/bin/activate
pip install --upgrade pip # setuptools
# pip install .[test,lint,static,dist]
pip install tox
- name: Get pip cache dir
id: pip-cache
run: |
echo "::set-output name=dir::$(pip cache dir)"
# - name: Lint with flake8
# run: |
# . venv/bin/activate
# # stop the build if there are Python syntax errors or undefined names
# # exit-zero treats all errors as warnings. The GitHub editor is 255 chars wide
# flake8
# - name: Static typing with mypy
# run: |
# . venv/bin/activate
# mypy
- name: Lint with flake8 & Static typing with mypy
run: |
. venv/bin/activate
TOX_PARALLEL_NO_SPINNER=1 tox --parallel -e lint,static
- name: pip cache
uses: actions/cache@v3
with:
Expand All @@ -82,16 +98,12 @@ jobs:
restore-keys: |
${{ runner.os }}-pip-
- name: Lint with flake8
run: |
. venv/bin/activate
# stop the build if there are Python syntax errors or undefined names
# exit-zero treats all errors as warnings. The GitHub editor is 255 chars wide
TOX_PARALLEL_NO_SPINNER=1 tox -e lint
- name: Test
run: |
. venv/bin/activate
TOX_PARALLEL_NO_SPINNER=1 tox -e py
# pytest
# tox --parallel
tox -e py
- name: Upload coverage test results to Codecov
uses: codecov/codecov-action@v2
if: |
Expand All @@ -109,12 +121,13 @@ jobs:
- name: Build distribution package
run: |
. venv/bin/activate
# python setup.py sdist bdist_wheel
pip install build
python -m build
ls -l dist
- name: Publish distribution package to TestPyPI
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
uses: pypa/gh-action-pypi-publish@master
uses: pypa/gh-action-pypi-publish@release/v1
with:
verify_metadata: true
skip_existing: true
Expand All @@ -124,7 +137,7 @@ jobs:

- name: Publish distribution package
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
uses: pypa/gh-action-pypi-publish@master
uses: pypa/gh-action-pypi-publish@release/v1
with:
verify_metadata: true
skip_existing: true
Expand Down
30 changes: 29 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,33 @@
# Change Log

## [2.0.0](https://github.com/ome9ax/target-s3-jsonl/tree/2.0.0) (2022-09-29)

### What's Changed
⚠️ 🚨 **BREAKING COMPATIBILITY: `Python 3.8` SUPPORT REMOVED** 🚨 ⚠️
* [target-core] Move the core features and functions in common shared [`target-core`](https://gitlab.com/singer-core/target-core) package by @ome9ax in #35

All the core stream processing functionalities are combined into [`target-core`](https://gitlab.com/singer-core/target-core).

#### [`target-core`](https://gitlab.com/singer-core/target-core) core functionalities
- The stream processing library is using [`asyncio.to_thread`](https://docs.python.org/3/library/asyncio-task.html?highlight=to_thread#asyncio.to_thread) introduced in **`Python 3.9`**.
- Better isolation architecture comes now by design between singer stream protocol and output custom processing. This opens for more native processing modularity and flexibility (API, S3, ...).
- Uses `sys.stdin.buffer` input reader over `sys.stdin` for more efficient input stream management.

#### `target-s3-jsonl` changes
- version `">=2.0"` developments will continue under **`Python 3.9`** and above.
- version `"~=1.0"` will keep living under the `legacy-v1` branch.
- Optimised memory and storage management: files are uploaded asynchronously and deleted on the fly, no longer all at once at the end.

#### Config file updates
- changes (those will be automatically replaced during the deprecation period for backward compatibility):
- `path_template` replaces `naming_convention` (*deprecated*). Few changes as well in the `path_template` syntax:
- `{date_time}` replaces `{timestamp}` (*deprecated*).
- `{date_time:%Y%m%d}` replaces `{date}` (*deprecated*).
- `work_dir` replaces `temp_dir` (*deprecated*).
- New option `file_size` for file partitioning by size limit. The `path_template` must contain a part section for the part number. Example `"path_template": "{stream}_{date_time:%Y%m%d_%H%M%S}_part_{part:0>3}.json"`.

**Full Changelog**: https://github.com/ome9ax/target-s3-jsonl/compare/1.2.2...2.0.0

## [1.2.2](https://github.com/ome9ax/target-s3-jsonl/tree/1.2.2) (2022-09-01)

### What's Changed
Expand All @@ -17,7 +45,7 @@
## [1.2.0](https://github.com/ome9ax/target-s3-jsonl/tree/1.2.0) (2022-04-11)

### What's Changed
* Upgrade version to 1.1.0: changelog by @ome9ax in https://github.com/ome9ax/target-s3-jsonl/pull/33
* Upgrade version to 1.2.0: changelog by @ome9ax in https://github.com/ome9ax/target-s3-jsonl/pull/33
* [jsonschema] Remove the deprecated custom exception to Handle `multipleOf` overflow fixed in jsonschema v4.0.0 by @ome9ax in https://github.com/ome9ax/target-s3-jsonl/pull/34
* [jsonschema] remove validation exception catching by @ome9ax in https://github.com/ome9ax/target-s3-jsonl/pull/36
* [persist_lines] save_records argument by @ome9ax in https://github.com/ome9ax/target-s3-jsonl/pull/37
Expand Down
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1 +1 @@
include requirements.txt LICENSE target_s3_jsonl/logging.conf
include src/target_s3_json/logging.conf, LICENSE
38 changes: 25 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ following the [Singer spec](https://github.com/singer-io/getting-started/blob/ma

`target-s3-jsonl` is a [Singer](https://singer.io) Target which intend to work with regular [Singer](https://singer.io) Tap. It take the output of the tap and export it as a [JSON Lines](http://jsonlines.org/) files into an AWS S3 bucket.

This package is built over the [`target-core`](https://gitlab.com/singer-core/target-core).

## Install

First, make sure Python 3 is installed on your system or follow these
Expand All @@ -35,21 +37,21 @@ pip install target-s3-jsonl
python -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install --upgrade https://github.com/ome9ax/target-s3-jsonl/archive/main.tar.gz
pip install --upgrade git+https://github.com/ome9ax/target-s3-jsonl.git@main
```

### Isolated virtual environment
```bash
python -m venv ~/.virtualenvs/target-s3-jsonl
source ~/.virtualenvs/target-s3-jsonl/bin/activate
pip install target-s3-jsonl
deactivate
~/.virtualenvs/target-s3-jsonl/bin/pip install target-s3-jsonl
```

Alternative
```bash
python -m venv ~/.virtualenvs/target-s3-jsonl
~/.virtualenvs/target-s3-jsonl/bin/pip install target-s3-jsonl
source ~/.virtualenvs/target-s3-jsonl/bin/activate
pip install target-s3-jsonl
deactivate
```

### To run
Expand Down Expand Up @@ -83,16 +85,23 @@ For non-profile based authentication set `aws_access_key_id` , `aws_secret_acces

Full list of options in `config.json`:

#### Inherited from `target-core`

| Property | Type | Mandatory? | Description |
|-------------------------------------|---------|------------|---------------------------------------------------------------|
| naming_convention | String | | (Default: None) Custom naming convention of the s3 key. Replaces tokens `date`, `stream`, and `timestamp` with the appropriate values.<br><br>Supports datetime and other python advanced string formatting e.g. `{stream:_>8}_{timestamp:%Y%m%d_%H%M%S}.json` or `{stream}/{timestamp:%Y}/{timestamp:%m}/{timestamp:%d}/{timestamp:%Y%m%d_%H%M%S_%f}.json`.<br><br>Supports "folders" in s3 keys e.g. `folder/folder2/{stream}/export_date={date}/{timestamp}.json`.<br><br>Honors the `s3_key_prefix`, if set, by prepending the "filename". E.g. naming_convention = `folder1/my_file.json` and s3_key_prefix = `prefix_` results in `folder1/prefix_my_file.json` |
| timezone_offset | Integer | | Offset value in hour. Use offset `0` hours is you want the `naming_convention` to use `utc` time zone. The `null` values is used by default. |
| memory_buffer | Integer | | Memory buffer's size used before storing the data into the temporary file. 64Mb used by default if unspecified. |
| temp_dir | String | | (Default: platform-dependent) Directory of temporary JSONL files with RECORD messages. |
| path_template | String | | (Default: None) Custom naming convention of the s3 key. Replaces tokens `date`, `stream`, and `timestamp` with the appropriate values.<br><br>Supports datetime and other python advanced string formatting e.g. `{stream:_>8}_{timestamp:%Y%m%d_%H%M%S}.json` or `{stream}/{timestamp:%Y}/{timestamp:%m}/{timestamp:%d}/{timestamp:%Y%m%d_%H%M%S_%f}.json`.<br><br>Supports "folders" in s3 keys e.g. `folder/folder2/{stream}/export_date={date}/{timestamp}.json`.<br><br>Honors the `s3_key_prefix`, if set, by prepending the "filename". E.g. path_template = `folder1/my_file.json` and s3_key_prefix = `prefix_` results in `folder1/prefix_my_file.json` |
| timezone_offset | Integer | | Offset value in hour. Use offset `0` hours is you want the `path_template` to use `utc` time zone. The `null` values is used by default. |
| memory_buffer | Integer | | Memory buffer's size used for non partitioned files before storing the data into the temporary file. 64Mb used by default if unspecified. |
| file_size | Integer | | File partitinoning by `size_limit`. File parts will be created. The `path_template` must contain a part section for the part number. Example `"path_template": "{stream}_{date_time:%Y%m%d_%H%M%S}_part_{part:0>3}.json"`. |
| work_dir | String | | (Default: platform-dependent) Directory of temporary JSONL files with RECORD messages. |
| compression | String | | The type of compression to apply before uploading. Supported options are `none` (default), `gzip`, and `lzma`. For gzipped files, the file extension will automatically be changed to `.json.gz` for all files. For `lzma` compression, the file extension will automatically be changed to `.json.xz` for all files. |
| local | Boolean | | Keep the file in the `temp_dir` directory without uploading the files on `s3`. |

#### Specific For `target-s3-jsonl`

| Property | Type | Mandatory? | Description |
|-------------------------------------|---------|------------|---------------------------------------------------------------|
| local | Boolean | | Keep the file in the `work_dir` directory without uploading the files on `s3`. |
| s3_bucket | String | Yes | S3 Bucket name |
| s3_key_prefix | String | | (Default: None) A static prefix before the generated S3 key names. |
| aws_profile | String | | AWS profile name for profile based authentication. If not provided, `AWS_PROFILE` environment variable will be used. |
| aws_endpoint_url | String | | AWS endpoint URL. |
| aws_access_key_id | String | | S3 Access Key Id. If not provided, `AWS_ACCESS_KEY_ID` environment variable will be used. |
Expand All @@ -104,17 +113,20 @@ Full list of options in `config.json`:

## Test
Install the tools

```bash
pip install .[test,lint]
```

Run pytest

```bash
pytest -p no:cacheprovider
```

## Lint
```bash
flake8 --show-source --statistics --count --extend-exclude .virtualenvs
```

## Release
1. Update the version number at the beginning of `target-s3-jsonl/target_s3_jsonl/__init__.py`
2. Merge the changes PR into `main`
Expand Down
2 changes: 1 addition & 1 deletion codecov.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
ignore:
- target-s3-jsonl/__init__.py
- target-s3-json/__init__.py
- tests/.*
- ./setup.py

Expand Down
9 changes: 0 additions & 9 deletions config.sample.json

This file was deleted.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ strict_equality = true

[[tool.mypy.overrides]] # Overrides for currently untyped modules
module = [
"target_s3_jsonl.*"
"target_s3_json.*"
]

[[tool.mypy.overrides]] # Overrides for currently untyped modules
Expand Down
Loading

0 comments on commit 7a2a9d8

Please sign in to comment.