Allow to provide compressed input #49

REASY · 2021-03-22T17:18:46Z

Would be a great to have an ability to provide a compressed file in GZIP/ZIP

binakot · 2021-04-29T06:10:02Z

It's good issue.

Currently timescaledb-parallel-copy just separate an input file on batch of rows: https://github.com/timescale/timescaledb-parallel-copy/blob/master/cmd/timescaledb-parallel-copy/main.go#L195. Implementation of this feature is required partial decompressing and understanding where begin and end another batch of rows.

Full preliminary decompressing of the file will not have any effect, given that the file may not fit into RAM. Also, without this mechanism, parallelism will not work, because each of the workers will not know which piece of data it needs to extract.

jchampio · 2022-08-24T18:30:02Z

Is an unzip pipeline helpful enough? E.g.

$ gunzip my-data.czv.gz | timescaledb-parallel-copy ...

This will unzip only enough to fill up the OS buffer and then it'll wait for the utility to read more. Or is there a particular reason you'd like the utility to handle this internally?

leonardochen · 2024-09-25T13:50:45Z

For reference, the command that works is:

gunzip -c csv.gz | tail -n+2 | timescaledb-parallel-copy ...

-c outputs into stdout
tail -n+2 ignores the first line

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to provide compressed input #49

Allow to provide compressed input #49

REASY commented Mar 22, 2021

binakot commented Apr 29, 2021

jchampio commented Aug 24, 2022

leonardochen commented Sep 25, 2024

Allow to provide compressed input #49

Allow to provide compressed input #49

Comments

REASY commented Mar 22, 2021

binakot commented Apr 29, 2021

jchampio commented Aug 24, 2022

leonardochen commented Sep 25, 2024