Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF-#5017: reset_index shouldn't trigger index materialization if possible #5018

Merged
merged 1 commit into from
Sep 23, 2022

Conversation

anmyachev
Copy link
Collaborator

@anmyachev anmyachev commented Sep 21, 2022

Signed-off-by: Myachev [email protected]

What do these changes do?

@anmyachev anmyachev changed the title PERF-#5017: reset_index shouldn't trigger index materialization if possible PERF-#5017: reset_index shouldn't trigger index materialization if possible Sep 21, 2022
@codecov
Copy link

codecov bot commented Sep 21, 2022

Codecov Report

Merging #5018 (e462e97) into master (fb4ed0d) will increase coverage by 4.70%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5018      +/-   ##
==========================================
+ Coverage   84.91%   89.62%   +4.70%     
==========================================
  Files         267      267              
  Lines       19747    20048     +301     
==========================================
+ Hits        16769    17968    +1199     
+ Misses       2978     2080     -898     
Impacted Files Coverage Δ
...odin/core/storage_formats/pandas/query_compiler.py 96.40% <100.00%> (+0.56%) ⬆️
modin/logging/config.py 94.59% <0.00%> (-1.30%) ⬇️
.../core/execution/dispatching/factories/factories.py 87.90% <0.00%> (ø)
...ive/implementations/omnisci_on_native/db_worker.py
...tal/core/storage_formats/omnisci/query_compiler.py
...on_native/interchange/dataframe_protocol/column.py
...mnisci_on_native/partitioning/partition_manager.py
...entations/omnisci_on_native/dataframe/dataframe.py
...mentations/omnisci_on_native/dataframe/__init__.py
..._on_native/interchange/dataframe_protocol/utils.py
... and 86 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@anmyachev anmyachev marked this pull request as ready for review September 22, 2022 11:22
@anmyachev anmyachev requested a review from a team as a code owner September 22, 2022 11:22
@@ -627,7 +627,8 @@ def reset_index(self, **kwargs):
else:
new_self = self.copy()
new_self.index = (
pandas.RangeIndex(len(new_self.index))
# Cheaper to compute row lengths than index
pandas.RangeIndex(sum(new_self._modin_frame._row_lengths))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to make QC know that _modin_frame has _row_lengths? Same question got hung in #4460.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YarShev Why not? The modin is based on this mechanism, in addition, this information can be used to advantage in some functions. I think that we can make these functions public (_row_lengths/_column_widths).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with this (link) but @devin-petersohn, @mvashishtha might have different thoughts on the matter. I would like to make sure we are on the same page.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support making row_lengths and column_widths public.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, let's make those public. Please create an issue for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YarShev YarShev merged commit 33ce061 into modin-project:master Sep 23, 2022
@anmyachev anmyachev deleted the issue5017 branch September 23, 2022 12:17
billiam-wang pushed a commit to billiam-wang/modin that referenced this pull request Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: reset_index shouldn't trigger index materialization in case when drop==True and level==None
3 participants