-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF-#5017: reset_index
shouldn't trigger index materialization if possible
#5018
Conversation
reset_index
shouldn't trigger index materialization if possible
Codecov Report
@@ Coverage Diff @@
## master #5018 +/- ##
==========================================
+ Coverage 84.91% 89.62% +4.70%
==========================================
Files 267 267
Lines 19747 20048 +301
==========================================
+ Hits 16769 17968 +1199
+ Misses 2978 2080 -898 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…ization if possible Signed-off-by: Myachev <[email protected]>
6083189
to
e462e97
Compare
@@ -627,7 +627,8 @@ def reset_index(self, **kwargs): | |||
else: | |||
new_self = self.copy() | |||
new_self.index = ( | |||
pandas.RangeIndex(len(new_self.index)) | |||
# Cheaper to compute row lengths than index | |||
pandas.RangeIndex(sum(new_self._modin_frame._row_lengths)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to make QC know that _modin_frame
has _row_lengths
? Same question got hung in #4460.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YarShev Why not? The modin is based on this mechanism, in addition, this information can be used to advantage in some functions. I think that we can make these functions public (_row_lengths/_column_widths
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with this (link) but @devin-petersohn, @mvashishtha might have different thoughts on the matter. I would like to make sure we are on the same page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support making row_lengths
and column_widths
public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, let's make those public. Please create an issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ization if possible (modin-project#5018) Signed-off-by: Myachev <[email protected]>
Signed-off-by: Myachev [email protected]
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
reset_index
shouldn't trigger index materialization in case whendrop==True
andlevel==None
#5017docs/development/architecture.rst
is up-to-date