Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chunked_filter (#344) #346

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rafalkrupinski
Copy link

No description provided.

Copy link
Owner

@mahmoud mahmoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, would also be good to see an extra test or two for edge cases in test_iterutils.py. Thanks again for this!

@@ -374,6 +374,30 @@ def chunked_iter(src, size, **kw):
return


def chunked_filter(iterable, predicate, chunk_size):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see a docstring here. Looking around at other functions, I noticed that most refer to the predicate as key. Do you mind updating this argument to be consistent?

Copy link
Author

@rafalkrupinski rafalkrupinski Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see various names and since it's not accessing a property of passed objects, I'd find key in this context confusing. builtins.filter uses iterable and function. Actually it's filter(function, iterable) (reverse order). Since it's a variation of that, how about using these names and order?

chunked_filter(function, iterable, size)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think the prime reference for this should be the sorted() function, where key is used. min() and max() also use key, I think. So, tldr. key should be preferred here. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think the prime reference for this should be the sorted() function, where key is used.

There's a good reason why sorted and min/max use key and filter & filterfalse don't. When you compare two objects, you either compare the value itself or its part, hence the key - attribute, only shorter. When you filter a collection, it can be an arbitrary criterium that's external to the object, so you use function (...to be called on an item) or more specifically, predicate.
Also, since it's an extension of builtins.filter, what's the reason for following naming pattern of any other function?

boltons/iterutils.py Outdated Show resolved Hide resolved
boltons/iterutils.py Show resolved Hide resolved
@rafalkrupinski
Copy link
Author

builtins.filter allows None function, in which case an identity function is used. Here it doesn't make much sense from the use case perspective, but perhaps it would from the usability perspective and for consistency?

Copy link
Owner

@mahmoud mahmoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress! I like all the edge case handling. I agree that there's no harm in allowing key to be None, except that then you may have to come up with a default for size, based on the ordering. I'm fine either way, so I'll let you make the call. :)

@@ -374,26 +374,50 @@ def chunked_iter(src, size, **kw):
return


def chunked_filter(iterable, predicate, chunk_size):
"""A version of :func:`filter` which will call predicate with a chunk of the iterable.
def chunked_filter(iterable, predicate, size):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, based on the docstring, I think you meant to rename predicate here to key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, see previous comments

[0, 2, 4, 6, 8]

In the above example the lambda function is called twice: once with values
0-4 and then for 5-9.

Args:
iterable (Iterable): Items to filter
predicate (Callable): Predicate function
chunk_size (int): The maximum size of chunks that will be passed the
predicate (Callable): Bulk predicate function that accepts a list of items
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate -> key to match description above?


allow_list = list(allow_iter)
if len(allow_list) != len(src_):
raise ValueError('expected the iterable from key(src) has the same length as the passed chunk of items')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"has the same" -> "to have the same"

I think it's great you're thinking about this. For exceptions, these days, I usually recommend the following format to maximize debuggability:

raise ValueError("chunked_filter expected key func to return an iterable of length {src_len}, got {allow_list_len}").

Similar changes could be made to other exception messages above just by adding ", not {actually_received_type}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants