Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement empty index for 0-rowed columns #1429

Merged
merged 26 commits into from
Apr 2, 2024

Conversation

vasil-pashov
Copy link
Collaborator

@vasil-pashov vasil-pashov commented Mar 15, 2024

Reference Issues/PRs

Closes: #1428

What does this implement or fix?

Create an empty-index type. This required change in the Python and in the C++ layer.

  • In the C++ layer an new index type was added. (IndexDescriptor::EMPTY). It does not allocate a filed in the storage (similar to how row range index does not allocate a field). The checks for index compatibility are relaxed, the empty index is compatible with all other index types and it gets overridden the first time a non-empty index is written (either through update or append). On write we check if the dataframe contains 0 rows and if so it gets assigned an empty index.
  • The logic in the python layer is dodgy and needs discussion. In the current state the normalization metadata and the index descriptor are stored separately. There is one proto message describing both DateTime index and Ranged Index. The current change made it so that in case of 0 rows the python layer passes RowRange index to the C++ layer which checks if there are any rows in the DF. If there are rows Row range index is used, otherwise empty index is used. Note the is_not_range_index proto field. IMO it needs some refactoring in further PRs. It's used in the python layer to check if the first column is index or not.

Any other comments?

Merge this after: #1436

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from types_proto.hpp

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to types_proto.cpp

cpp/proto/arcticc/pb2/descriptors.proto Show resolved Hide resolved
python/arcticdb/version_store/_normalization.py Outdated Show resolved Hide resolved
cpp/arcticdb/version/schema_checks.hpp Outdated Show resolved Hide resolved
}
}

inline bool columns_match(const StreamDescriptor &left, const StreamDescriptor &right) {
if (left.fields().size() != right.fields().size())
inline bool columns_match(const StreamDescriptor& left, const StreamDescriptor& right) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this method should be symmetrical, but it does not handle the left non-empty/right empty case correctly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right is not supposed to have empty index since modifying operations, except write, with 0 rows don't reach the C++ layer. Should I make it symmetric?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine for it to not be symmetric, I would just rename the parameters to make it more obvious they can't be swapped

Vasil Pashov added 2 commits March 21, 2024 19:25
Modify the normalization so that 0-rowed dataframe uniformly return
Index([]) with dtype='object'
@vasil-pashov vasil-pashov force-pushed the dev/vasil.pashov/empty-index branch from 22e63be to 81871d9 Compare March 28, 2024 16:59
}
}

inline bool columns_match(const StreamDescriptor &left, const StreamDescriptor &right) {
if (left.fields().size() != right.fields().size())
inline bool columns_match(const StreamDescriptor& left, const StreamDescriptor& right) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine for it to not be symmetric, I would just rename the parameters to make it more obvious they can't be swapped

@vasil-pashov vasil-pashov changed the base branch from master to feature/empty_index April 2, 2024 11:53
@vasil-pashov vasil-pashov merged commit a14eb48 into feature/empty_index Apr 2, 2024
108 of 109 checks passed
@vasil-pashov vasil-pashov deleted the dev/vasil.pashov/empty-index branch April 2, 2024 11:55
@vasil-pashov vasil-pashov mentioned this pull request Apr 5, 2024
5 tasks
vasil-pashov added a commit that referenced this pull request Apr 23, 2024
#### Reference Issues/PRs
Relates to: #1429, #1440

#### What does this implement or fix?
Feature flag the use of empty type. It uses the same library config
option as the empty column type. In order to preserve behavior empty
dataframes will use DataTimeIndex.
#### Any other comments?

#### Checklist

<details>
  <summary>
   Checklist for code changes...
  </summary>
 
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->

---------

Co-authored-by: Vasil Pashov <[email protected]>
vasil-pashov added a commit that referenced this pull request Apr 24, 2024
#### Reference Issues/PRs
Closes: #1428 

#### What does this implement or fix?
Create an empty-index type. This required change in the Python and in
the C++ layer.
* In the C++ layer an new index type was added.
(IndexDescriptor::EMPTY). It does not allocate a filed in the storage
(similar to how row range index does not allocate a field). The checks
for index compatibility are relaxed, the empty index is compatible with
all other index types and it gets overridden the first time a non-empty
index is written (either through update or append). On write we check if
the dataframe contains 0 rows and if so it gets assigned an empty index.
* The logic in the python layer is dodgy and needs discussion. In the
current state the normalization metadata and the index descriptor are
stored separately. There is one proto message describing both DateTime
index and Ranged Index. The current change made it so that in case of 0
rows the python layer passes RowRange index to the C++ layer which
checks if there are any rows in the DF. If there are rows Row range
index is used, otherwise empty index is used. Note the
`is_not_range_index` proto field. IMO it needs some refactoring in
further PRs. It's used in the python layer to check if the first column
is index or not.
#### Any other comments?

Merge this after: #1436

#### Checklist

<details>
  <summary>
   Checklist for code changes...
  </summary>
 
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->

---------

Co-authored-by: Vasil Pashov <[email protected]>
vasil-pashov added a commit that referenced this pull request Apr 24, 2024
#### Reference Issues/PRs
Relates to: #1429, #1440

#### What does this implement or fix?
Feature flag the use of empty type. It uses the same library config
option as the empty column type. In order to preserve behavior empty
dataframes will use DataTimeIndex.
#### Any other comments?

#### Checklist

<details>
  <summary>
   Checklist for code changes...
  </summary>
 
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->

---------

Co-authored-by: Vasil Pashov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot append to dataframe with 0 rows
3 participants