Skip to content

Commit

Permalink
[coverage] release version 1.0.0 bump 🥳🥂🍾 tests & spec, 100% comple…
Browse files Browse the repository at this point in the history
…te coverage (#17)

* [coverage] release version 1.0.0 bump 🥳🥂🍾 tests & spec, `100%` complete coverage

* [coverage] style

* [coverage] spec
  • Loading branch information
ome9ax authored Aug 18, 2021
1 parent 638b359 commit e20ebc4
Show file tree
Hide file tree
Showing 9 changed files with 97 additions and 51 deletions.
28 changes: 23 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,25 @@
# Change Log

## [1.0.0](https://github.com/ome9ax/target-s3-jsonl/tree/1.0.0) (2021-08-18)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/compare/0.0.7...1.0.0)

### Closed issues:
- release version 1.0.0 bump 🥳🥂🍾 tests & spec, `100%` complete coverage

### Merged pull requests:
- [[coverage] release version 1.0.0 bump 🥳🥂🍾 tests & spec, `100%` complete coverage](https://github.com/ome9ax/target-s3-jsonl/pull/17)

## [0.0.7](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.7) (2021-08-18)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/compare/0.0.6...0.0.7)

### Closed issues:
- Much more specs and tests, coverage increased to `98.09%`

### Merged pull requests:
- [[coverage] tests & spec, 98.09% further coverage](https://github.com/ome9ax/target-s3-jsonl/pull/16)

## [0.0.6](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.6) (2021-08-17)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.5.2...0.0.6)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/compare/0.0.5.2...0.0.6)

### Closed issues:
- Much more specs and tests, coverage increased to `96.91%`
Expand All @@ -10,7 +28,7 @@
- [[coverage] bump version 0.0.6 changelog update: Much more specs and tests, coverage increased to `96.91%`](https://github.com/ome9ax/target-s3-jsonl/pull/15)

## [0.0.5.2](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.5.2) (2021-08-13)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.5.1...0.0.5.2)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/compare/0.0.5.1...0.0.5.2)

### New features:
- replace `io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')` with `sys.stdin` as it's already natively defined as `<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>`
Expand All @@ -19,7 +37,7 @@
- [[readlines] replace `io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')` with `sys.stdin`](https://github.com/ome9ax/target-s3-jsonl/pull/13)

## [0.0.5.1](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.5.1) (2021-08-12)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.5...0.0.5.1)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/compare/0.0.5...0.0.5.1)

### Fixed bugs:
- Issue to decompress archived files
Expand All @@ -31,7 +49,7 @@
- [[compression] fix compression management](https://github.com/ome9ax/target-s3-jsonl/pull/12)

## [0.0.5](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.5) (2021-08-12)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.4...0.0.5)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/compare/0.0.4...0.0.5)

### New features:
- I now store the rows in an Array on memory, and unload the Array into the file by batches. By default the batch size is 64Mb configurable with the `memory_buffer` config option.
Expand All @@ -46,7 +64,7 @@
- [[Metadata] manage tap Metadata _sdc columns according to the stitch documentation](https://github.com/ome9ax/target-s3-jsonl/pull/9)

## [0.0.4](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.4) (2021-08-09)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/tree/0.0.0...0.0.4)
[Full Changelog](https://github.com/ome9ax/target-s3-jsonl/compare/0.0.0...0.0.4)

### New features:
- Initial release
7 changes: 1 addition & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,11 @@
# target-s3-jsonl

<!-- [![License: Apache2](https://img.shields.io/badge/License-Apache2-blue.svg)](https://opensource.org/licenses/Apache-2.0) -->
<!-- ![PyPI - License](https://img.shields.io/pypi/l/target-s3-jsonl) -->
![GitHub - License](https://img.shields.io/github/license/ome9ax/target-s3-jsonl)
[![Python package builder](https://github.com/ome9ax/target-s3-jsonl/workflows/Python%20package/badge.svg)](https://github.com/ome9ax/target-s3-jsonl)
[![codecov](https://codecov.io/gh/ome9ax/target-s3-jsonl/branch/main/graph/badge.svg?token=KV0cn4jKs2)](https://codecov.io/gh/ome9ax/target-s3-jsonl)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/target-s3-jsonl.svg)](https://pypi.org/project/target-s3-jsonl/)
[![PyPI version](https://badge.fury.io/py/target-s3-jsonl.svg)](https://badge.fury.io/py/target-s3-jsonl)
[![PyPi project installs](https://img.shields.io/pypi/dm/target-s3-jsonl.svg?maxAge=2592000&label=installs&color=%2327B1FF)](https://pypi.org/project/target-s3-jsonl)
<!-- [![PyPi project version](https://img.shields.io/pypi/v/target-s3-jsonl.svg)](https://pypi.org/project/target-s3-jsonl) -->

[Singer](https://www.singer.io/) target that uploads loads data to S3 in JSONL format
following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md).
Expand All @@ -17,8 +14,6 @@ following the [Singer spec](https://github.com/singer-io/getting-started/blob/ma

`target-s3-jsonl` is a [Singer](https://singer.io) Target which intend to work with regular [Singer](https://singer.io) Tap. It take the output of the tap and export it as a [JSON Lines](http://jsonlines.org/) files.

It re-uses [PipelineWise `pipelinewise-target-s3-csv`](https://transferwise.github.io/pipelinewise) s3 client logic to upload the JSONL file directly to S3.

## Install

First, make sure Python 3 is installed on your system or follow these
Expand Down Expand Up @@ -104,7 +99,7 @@ Full list of options in `config.json`:
Run pytest

```bash
python -m pytest -p no:cacheprovider
pytest -p no:cacheprovider
```

## License
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
jsonschema==3.2.0
boto3==1.18.16
boto3==1.18.22
backoff==1.11.1
9 changes: 1 addition & 8 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
name = target-s3-jsonl
version = attr: target_s3_jsonl.__version__
description = Singer.io target for writing JSON Line files and upload to S3
# long_description = file: README.md, CHANGELOG.md, LICENSE
long_description = file: README.md
long_description_content_type = text/markdown
author = Eddy ∆
Expand All @@ -11,14 +10,13 @@ url = https://github.com/ome9ax/target-s3-jsonl
keywords = target-s3-jsonl, target-s3-json, singer, singer.io, tap, target, etl, json, jsonl, aws, s3
license = Apache License 2.0
classifiers =
Development Status :: 4 - Beta
Development Status :: 5 - Production/Stable
Operating System :: OS Independent
License :: OSI Approved :: Apache Software License
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9

[options]
# zip_safe = False
packages = find:
py_modules = target_s3_jsonl
python_requires = >=3.8
Expand Down Expand Up @@ -60,12 +58,7 @@ show_missing = True
skip_covered = False

[flake8]
# filename = .
extend-exclude = venv
# exclude = tests/*
# ignore = E226,E302,E41
# ignore = E9,F63,F7,F82
# ignore = E731,E402,W503,E203,F401,F821
ignore = C901
max-line-length = 160
max-complexity = 10
23 changes: 13 additions & 10 deletions target_s3_jsonl/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env python3

__version__ = '0.0.7'
__version__ = '1.0.0'

import argparse
import gzip
Expand Down Expand Up @@ -58,15 +58,16 @@ def add_metadata_values_to_record(record_message, schema_message, timestamp):
def remove_metadata_values_from_record(record_message):
'''Removes every metadata _sdc column from a given record message
'''
keys = {
for key in {
'_sdc_batched_at',
'_sdc_deleted_at',
'_sdc_extracted_at',
'_sdc_primary_key',
'_sdc_received_at',
'_sdc_sequence',
'_sdc_table_version'}
for key in keys:
'_sdc_table_version'
}:

record_message['record'].pop(key, None)

return record_message['record']
Expand All @@ -76,7 +77,7 @@ def emit_state(state):
if state is not None:
line = json.dumps(state)
LOGGER.debug('Emitting state {}'.format(line))
sys.stdout.write("{}\n".format(line))
sys.stdout.write('{}\n'.format(line))
sys.stdout.flush()


Expand Down Expand Up @@ -193,7 +194,10 @@ def persist_lines(messages, config):
validators[stream].validate(float_to_decimal(record_to_load))
except Exception as ex:
# NOTE: let anything but 'InvalidOperation' raised Exception slip by
if type(ex).__name__ == "InvalidOperation": # TODO pragma: no cover
# And actual references of the validator logic can be find
# at https://github.com/Julian/jsonschema/blob/main/jsonschema/_validators.py
# logic covered in the 'jsonschema' package
if type(ex).__name__ == "InvalidOperation": # pragma: no cover
LOGGER.error(
"Data validation failed and cannot load to destination. RECORD: {}\n"
"'multipleOf' validations that allows long precisions are not supported"
Expand All @@ -208,8 +212,7 @@ def persist_lines(messages, config):

file_data[stream]['file_data'].append(json.dumps(record_to_load) + '\n')

# NOTE: write temporary file
# Use 64Mb default memory buffer
# NOTE: write the lines into the temporary file when received data over 64Mb default memory buffer
if sys.getsizeof(file_data[stream]['file_data']) > config.get('memory_buffer', 64e6):
save_file(file_data[stream], open_func)

Expand Down Expand Up @@ -262,8 +265,8 @@ def main():
parser.add_argument('-c', '--config', help='Config file', required=True)
args = parser.parse_args()

with open(args.config) as input_json:
config = json.load(input_json)
with open(args.config) as input_file:
config = json.load(input_file)

missing_params = {'s3_bucket'} - set(config.keys())
if missing_params:
Expand Down
33 changes: 16 additions & 17 deletions target_s3_jsonl/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def create_client(config):
)
# AWS Profile based authentication
else:
aws_session = boto3.session.Session(profile_name=aws_profile) # TODO pragma: no cover
aws_session = boto3.session.Session(profile_name=aws_profile)

if aws_endpoint_url:
s3 = aws_session.client('s3', endpoint_url=aws_endpoint_url)
Expand All @@ -60,23 +60,22 @@ def upload_file(filename, s3_client, bucket, s3_key,
# No encryption config (defaults to settings on the bucket):
encryption_desc = ""
encryption_args = None
else:
if encryption_type.lower() == "kms":
encryption_args = {"ServerSideEncryption": "aws:kms"}
if encryption_key:
encryption_desc = (
" using KMS encryption key ID '{}'"
.format(encryption_key)
)
encryption_args["SSEKMSKeyId"] = encryption_key
else:
encryption_desc = " using default KMS encryption"
else:
raise NotImplementedError(
"Encryption type '{}' is not supported. "
"Expected: 'none' or 'KMS'"
.format(encryption_type)
elif encryption_type.lower() == "kms":
encryption_args = {"ServerSideEncryption": "aws:kms"}
if encryption_key:
encryption_desc = (
" using KMS encryption key ID '{}'"
.format(encryption_key)
)
encryption_args["SSEKMSKeyId"] = encryption_key
else:
encryption_desc = " using default KMS encryption"
else:
raise NotImplementedError(
"Encryption type '{}' is not supported. "
"Expected: 'none' or 'KMS'"
.format(encryption_type)
)
LOGGER.info(
"Uploading {} to bucket {} at {}{}"
.format(filename, bucket, s3_key, encryption_desc)
Expand Down
6 changes: 4 additions & 2 deletions tests/resources/aws_credentials
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
[dummy]
aws_access_key_id = testing
aws_secret_access_key = testing
aws_access_key_id = a_key
aws_secret_access_key = no_big_secret
aws_security_token = testing
aws_session_token = testing
23 changes: 21 additions & 2 deletions tests/test_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -351,8 +351,8 @@ def test_persist_lines(caplog, config, input_data, input_multi_stream_data, inva
dummy_type = '{"type": "DUMMY", "value": {"currently_syncing": "tap_dummy_test-test_table_one"}}'
output_state, output_file_metadata = persist_lines([dummy_type] + input_multi_stream_data, config)

assert 'WARNING root:__init__.py:252 Unknown message type "{}" in message "{}"'.format(
json.loads(dummy_type)['type'], dummy_type.replace('"', "'")) + '\n' == caplog.text
assert caplog.text == 'WARNING root:__init__.py:255 Unknown message type "{}" in message "{}"'.format(
json.loads(dummy_type)['type'], dummy_type.replace('"', "'")) + '\n'

with raises(NotImplementedError):
config_copy = deepcopy(config)
Expand Down Expand Up @@ -417,6 +417,25 @@ def test_persist_lines(caplog, config, input_data, input_multi_stream_data, inva

clear_dir(Path(config['temp_dir']))

# schema = {
# "type": "SCHEMA", "stream": "users", "key_properties": ["id"],
# "schema": {
# "required": ["id"], "type": "object",
# "properties": {"id": {"type": "integer"}}}}

# record = {"type": "RECORD", "stream": "users", "record": {"id": 1, "name": "X"}}

# with raises(Exception):
# dummy_input_data = deepcopy(input_data)
# dummy_schema = deepcopy(schema)
# # dummy_schema['schema']['properties']['id']['minimum'] = -2147483648
# # dummy_schema['schema']['properties']['id']['maximum'] = 2147483647
# dummy_schema['schema']['properties']['id']['multipleOf'] = 64.0
# dummy_record = deepcopy(record)
# dummy_record['record']['id'] = 9007199254740996e646
# dummy_input_data.insert(1, json.dumps(dummy_schema))
# output_state, output_file_metadata = persist_lines(dummy_input_data, config)


@mock_s3
def test_main(monkeypatch, capsys, patch_datetime, patch_argument_parser, input_multi_stream_data, config, state, file_metadata):
Expand Down
17 changes: 17 additions & 0 deletions tests/test_s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ def aws_credentials():
moto_credentials_file_path = Path('tests', 'resources', 'aws_credentials')
os.environ['AWS_SHARED_CREDENTIALS_FILE'] = str(moto_credentials_file_path)

os.environ['AWS_ACCESS_KEY_ID'] = 'that_key'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'no_big_secret'


@mock_s3
def test_create_client(aws_credentials, config):
Expand All @@ -51,6 +54,20 @@ def test_create_client(aws_credentials, config):
client.put_object(Bucket=config_copy['s3_bucket'], Key='Eddy is', Body='awesome!')
body = conn.Object(config_copy['s3_bucket'], 'Eddy is').get()['Body'].read().decode("utf-8")

# NOTE: AWS Profile based authentication
config_copy = deepcopy(config)
config_copy['aws_profile'] = 'dummy'
config_copy.pop('aws_access_key_id')
config_copy.pop('aws_secret_access_key')
os.environ.pop('AWS_ACCESS_KEY_ID')
os.environ.pop('AWS_SECRET_ACCESS_KEY')

client = create_client(config_copy)
client.put_object(Bucket=config_copy['s3_bucket'], Key='Look!', Body='No access key!')
body = conn.Object(config_copy['s3_bucket'], 'Look!').get()['Body'].read().decode("utf-8")

assert body == 'No access key!'


@mock_s3
def test_upload_file(config):
Expand Down

0 comments on commit e20ebc4

Please sign in to comment.