-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat hive partitioning #41
base: main
Are you sure you want to change the base?
Conversation
This PR forces the id/var dir names to adopt a hive partitioning scheme - should it be optional? For the partitions, hive partitioning seems to be enforced/hardcoded at https://github.com/ltelab/tstore/blob/feat-hive-partitioning/tstore/archive/ts/writers/pyarrow.py#L77 |
c29b057
to
d50c06f
Compare
I fixed some issues and rebased this to have a proper PR with this feature only. Now we can review it. I wonder: is it worth making the hive scheme optional at this point? I suggest we move forward with hive only. We may consider supporting futher schemes later on. |
I will review this tomorrow or Friday @martibosch. But as quick thought I would not enforce Two further considerations. A TS object / partitioned parquet dataset is readable in whatever language-agnostic dataframe/query engine supporting reading parquet file. A TSTORE directory structure with hive partitioning is not readable:
|
ok I can make it optional, but I understand that for now we still leave the "hive" time partitioning hardcoded at https://github.com/ltelab/tstore/blob/feat-hive-partitioning/tstore/archive/ts/writers/pyarrow.py#L77 ? |
d50c06f
to
a87ea6f
Compare
I amended the first commit in order to try to make this work not only for tslong but also for tsdf write/load. |
a87ea6f
to
e5f7b83
Compare
I have added a second commit drafting what I understand should be the rationale f the |
Sorry again for overthinking and for the likely premature optimization, but this is probably a good point to consider whether we need the |
Once the above issues are clear we can see how we make the id and var-level hive scheme optional, e.g., allow paths of the form |
Prework
What kind of change does this PR introduce? (check at least one)
Does this PR introduce a breaking change? (check one)
If yes, please describe the impact and communicate accordingly:
The PR fulfills these requirements:
bugfix-<some_key>-<word>
doc-<some_key>-<word>
tutorial-<some_key>-<word>
feature-<some_key>-<word>
refactor-<some_key>-<word>
optimize-<some_key>-<word>
fix #xxx[,#xxx]
, where "xxx" is the issue number)If adding a new feature, the PR's description includes:
Other information:
Related GitHub issues and pull requests
Summary
Please explain the purpose and scope of your contribution.