Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Add tests.groupby.methods #55312

Merged
merged 12 commits into from
Oct 12, 2023
Merged

Conversation

rhshadrach
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Is this a bad idea? Just moving around code without much value? It has a similar structure to tests.frame.methods. The end goal is to have test_function.py completely removed.

@rhshadrach rhshadrach added Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite Groupby Needs Discussion Requires discussion from core team before further action labels Sep 27, 2023
@mroeschke mroeschke added this to the 2.2 milestone Sep 28, 2023
@mroeschke
Copy link
Member

I think this is a good idea. Always good to have tighter scoped test organization

@lithomas1
Copy link
Member

lithomas1 commented Sep 28, 2023

In general, I like splitting big files, however the downside is that when trying to only run 1 test by hand, the pytest command becomes longer.

In this case, I don't think its worth it if there's just 1 test in a file.

Maybe we can split the test_function.py file into a file for reductions and another file for transformations instead?

@rhshadrach
Copy link
Member Author

rhshadrach commented Sep 28, 2023

In this case, I don't think its worth it if there's just 1 test in a file.

That isn't going to be the final state - but I didn't want to put too much into a PR. I moved the ones that were clearly meant just to test the given method from test_function.py. There will be more.

Maybe we can split the test_function.py file into a file for reductions and another file for transformations instead?

We already have method-specific files for some reductions and transformations. Are you suggesting to consolidate all of the existing ones into these files? E.g.

  • nunique
  • quantile
  • rank
  • size
  • skew
  • min_max
  • nth
  • any_all
  • shift_diff

@lithomas1
Copy link
Member

In this case, I don't think its worth it if there's just 1 test in a file.

That isn't going to be the final state - but I didn't want to put too much into a PR. I moved the ones that were clearly meant just to test the given method from test_function.py. There will be more.

Are you planning on moving other tests in addition to tests from test_function.py?
(I think you've moved over half of the functions from test_function already)

IMO, if there are less than 10 tests in a file, I don't think it'd be worth it to make the new file in general.

Maybe we can split the test_function.py file into a file for reductions and another file for transformations instead?

We already have method-specific files for some reductions and transformations. Are you suggesting to consolidate all of the existing ones into these files? E.g.

  • nunique
  • quantile
  • rank
  • size
  • skew
  • min_max
  • nth
  • any_all
  • shift_diff

Yeah, I think you already do that for the cum* methods (I saw you put them in test_cum.py).

This might make the PR too big though. Some of those files are also really large for some reason (at least min_max can be improved with more parameterization).

@rhshadrach
Copy link
Member Author

I don't think line or test counts per file should be our primary concern. To me, the main thing I want to make easy is the scenario "I'm a new contributor, I am fixing a behavior, where do I put a test?". For this, I think we should prefer a more consistent test organization rather than sacrificing layout for grouping a sufficient number of tests to satisfy some limit.

That said, having too many files in the test suite can definitely be detrimental. I don't think this structure approaches that amount, but that is quite opinionated (as is the rest of this comment 😉 ).

@lithomas1
Copy link
Member

I don't think line or test counts per file should be our primary concern. To me, the main thing I want to make easy is the scenario "I'm a new contributor, I am fixing a behavior, where do I put a test?". For this, I think we should prefer a more consistent test organization rather than sacrificing layout for grouping a sufficient number of tests to satisfy some limit.

The concern for me is typing out the extra characters in the path to the test file when I want to run a single test.
e.g.
pytest pandas/tests/groupby/test_function.py -v -k ... -> pytest pandas/tests/groupby/methods/test_foo.py -v -k ...
(if there's only a couple of tests in that file, then it isn't worth the annoyance for me esp. if I break multiple tests across files and have to type a long path a bunch of times)

This isn't a very strong opinion, though, and the current version in the PR is definitely better than what we have in main.

@@ -0,0 +1,291 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in tests.frame and tests.series with have test_cumulative.py. can we use that pattern? and if we're really trying to follow the pattern, that file goes outside the methods/ directory

assert df.groupby("user")["connections"].mean()["A"] == 3689348814740003840


def test_mean_on_timedelta():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsewhere we have test_reductions.py, less fine-grained than this

@jbrockmendel
Copy link
Member

the main thing I want to make easy is the scenario "I'm a new contributor, I am fixing a behavior, where do I put a test?

+1. This can be non-trivial for even experienced contributors.

…n_gb_tests_5

# Conflicts:
#	pandas/tests/groupby/test_function.py
@@ -603,11 +603,11 @@ def test_area_lim(self, stacked):
lines = ax.get_lines()
assert xmin <= lines[0].get_data()[0][0]
assert xmax >= lines[0].get_data()[0][-1]
assert ymin == 0
assert ymin == 0, ymin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intended to be included in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops - no, that was for another PR. Thanks, reverted.

@mroeschke mroeschke merged commit 9de2a19 into pandas-dev:main Oct 12, 2023
33 checks passed
@mroeschke
Copy link
Member

Thanks @rhshadrach. This is definitely a net positive

@rhshadrach rhshadrach deleted the clean_gb_tests_5 branch October 13, 2023 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Needs Discussion Requires discussion from core team before further action Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants