-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement empty index for 0-rowed columns #1429
Implement empty index for 0-rowed columns #1429
Conversation
…n. Arctic no longer persists the dtype because empty columns are set to be of empty type. Which in Python terms is object dtype.
…ndex until empty index is created.
… tests for the compat-36 and compat-38 versions. The behavior shall be fixed with a later commit
cpp/arcticdb/entity/types_proto.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from types_proto.hpp
cpp/arcticdb/entity/types_proto.hpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to types_proto.cpp
} | ||
} | ||
|
||
inline bool columns_match(const StreamDescriptor &left, const StreamDescriptor &right) { | ||
if (left.fields().size() != right.fields().size()) | ||
inline bool columns_match(const StreamDescriptor& left, const StreamDescriptor& right) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this method should be symmetrical, but it does not handle the left non-empty/right empty case correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right is not supposed to have empty index since modifying operations, except write, with 0 rows don't reach the C++ layer. Should I make it symmetric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine for it to not be symmetric, I would just rename the parameters to make it more obvious they can't be swapped
python/tests/unit/arcticdb/version_store/test_empty_column_type.py
Outdated
Show resolved
Hide resolved
python/tests/unit/arcticdb/version_store/test_empty_column_type.py
Outdated
Show resolved
Hide resolved
python/tests/unit/arcticdb/version_store/test_empty_column_type.py
Outdated
Show resolved
Hide resolved
python/tests/unit/arcticdb/version_store/test_empty_column_type.py
Outdated
Show resolved
Hide resolved
Modify the normalization so that 0-rowed dataframe uniformly return Index([]) with dtype='object'
22e63be
to
81871d9
Compare
…rcticDB into proto_rename
} | ||
} | ||
|
||
inline bool columns_match(const StreamDescriptor &left, const StreamDescriptor &right) { | ||
if (left.fields().size() != right.fields().size()) | ||
inline bool columns_match(const StreamDescriptor& left, const StreamDescriptor& right) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine for it to not be symmetric, I would just rename the parameters to make it more obvious they can't be swapped
#### Reference Issues/PRs Relates to: #1429, #1440 #### What does this implement or fix? Feature flag the use of empty type. It uses the same library config option as the empty column type. In order to preserve behavior empty dataframes will use DataTimeIndex. #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Vasil Pashov <[email protected]>
#### Reference Issues/PRs Closes: #1428 #### What does this implement or fix? Create an empty-index type. This required change in the Python and in the C++ layer. * In the C++ layer an new index type was added. (IndexDescriptor::EMPTY). It does not allocate a filed in the storage (similar to how row range index does not allocate a field). The checks for index compatibility are relaxed, the empty index is compatible with all other index types and it gets overridden the first time a non-empty index is written (either through update or append). On write we check if the dataframe contains 0 rows and if so it gets assigned an empty index. * The logic in the python layer is dodgy and needs discussion. In the current state the normalization metadata and the index descriptor are stored separately. There is one proto message describing both DateTime index and Ranged Index. The current change made it so that in case of 0 rows the python layer passes RowRange index to the C++ layer which checks if there are any rows in the DF. If there are rows Row range index is used, otherwise empty index is used. Note the `is_not_range_index` proto field. IMO it needs some refactoring in further PRs. It's used in the python layer to check if the first column is index or not. #### Any other comments? Merge this after: #1436 #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Vasil Pashov <[email protected]>
#### Reference Issues/PRs Relates to: #1429, #1440 #### What does this implement or fix? Feature flag the use of empty type. It uses the same library config option as the empty column type. In order to preserve behavior empty dataframes will use DataTimeIndex. #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Vasil Pashov <[email protected]>
Reference Issues/PRs
Closes: #1428
What does this implement or fix?
Create an empty-index type. This required change in the Python and in the C++ layer.
is_not_range_index
proto field. IMO it needs some refactoring in further PRs. It's used in the python layer to check if the first column is index or not.Any other comments?
Merge this after: #1436
Checklist
Checklist for code changes...