Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_keys and marker support for get_bucket #88

Merged
merged 15 commits into from
Jan 3, 2018
Merged

Conversation

exarkun
Copy link
Member

@exarkun exarkun commented Jan 2, 2018

Fixes #87

Note the issue only mentions max_keys but fixing is_truncated is pretty important once max_keys is supported so I did that as well. And then I noticed marker was still unsupported so I couldn't resist implementing that as well.

Copy link
Member

@markrwilliams markrwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving txaws.testing! The substance of this PR looks fine. I believe the news fragment points to the wrong class, and there's a typo in a comment. Please fix these issues and consider the other comments in this review before merging.

@@ -94,23 +95,45 @@ def delete_bucket(self, bucket):
return succeed(None)

@_rate_limited
def get_bucket(self, bucket, prefix=None):
def get_bucket(self, bucket, marker=None, max_keys=None, prefix=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not set max_keys to 1000 here? That would make it discoverable via introspection and documentation if we ever set up a documentation generation system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I much prefer the idiom of "None means the default". That way, if you ever want the default behavior and you find it more convenient to pass the argument than not, you can just pass None. You don't have to duplicate the default value (and then diverge when the default changes).

This is pretty much just because of function call syntax. This is unnecessary repetition:

    if have_no_special_value:
        foo()
    else:
        foo(something=special_value)

as compared to:

    foo(something=special_value_which_might_be_none)

I agree that discoverability is good but I don't think it wins out in this case.

content
for content
in sorted(listing.contents, key=lambda item: item.key)
if content.key.startswith(prefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix is bytes while keys_after is unicode, but both are compared against content.key. The tests below seem to indicate content.key should be unicode, but whatever the type is, it should be consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the literals in get_bucket to bytes, at least, which matches the documented type of the parameters (documented on the real implementation of get_bucket). Filed #89 to sort out (at least part of) this mess for real.

@@ -101,6 +105,32 @@ def test_objects(self):
"Expected to not find deleted objects in listing {}".format(objects),
)

def test_get_bucket_object_order(self):
"""
The objects returned by C{get_bucket} are sorted lexicographically by their
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be L{txaws.testing.s3._MemoryS3Client.get_bucket}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is run against the in-memory implementation and the AWS implementation. So it's really verifying they both sort like this. In the latter case, it is totally AWS that does the sorting, not us. But I guess I like the docstring as-is where it's not referring to just one implementation but stating something about the interface.

The objects returned by C{get_bucket} are sorted lexicographically by their
key.
"""
bucket_name = str(uuid4())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bucket names should be unicode, though apparently they can be UTF-8 encoded bytes. I use txaws' test suite as a source of documentation and usage patterns, so while other tests use str for bucket names, it would be nice if they consistently used unicode instead.


def test_get_bucket_max_keys(self):
"""
C{max_keys} can be passed to C{get_bucket} to limit the number of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be L{txaws.testing.s3._MemoryS3Client.get_bucket}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think.

not specified.
"""
max_keys = 1000
bucket_name = str(uuid4())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about unicode bucket names as above.

def created_bucket(ignored):
# Put a bunch ofobjects. The default limit is 1000.
work = (
client.put_object(bucket_name, unicode(i).encode("ascii"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about the type of S3 keys as above.

client = get_client(self)
d = client.create_bucket(bucket_name)
def created_bucket(ignored):
# Put a bunch ofobjects. The default limit is 1000.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put a bunch of objects.

for i in range(max_keys + 3)
)
return gatherResults([
cooperate(work).whenDone(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting the work across 3 CooperativeTasks is consistently slower than gatherResults(list(work)) in my tests. YMMV.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting the parallelism to 3x (instead of 1000x) drastically helped on my network. Perhaps in part because it's slower? But my thinking was that starting 3 TCP connections instead of 1000 would probably make life easier on somebody's router. I could probably go to 10x or 15x without much harm and maybe that would recover some of the performance loss as well. Of course, it will always be slower to limit the parallelism than to not limit it when the code is run on a sufficiently fast/capable network.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising it past 5x causes it to begin failing on my network. Gonna just leave it at 3. :/

@@ -0,0 +1 @@
txaws.s3.client.S3Client.get_bucket now accepts ``max_keys`` and ``marker`` parameters for controlling paging through objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3Client.get_bucket has accepted markers and max_keys for a while. This should be txaws.testing.s3.S3Client.

@codecov
Copy link

codecov bot commented Jan 3, 2018

Codecov Report

Merging #88 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #88   +/-   ##
=======================================
  Coverage   96.54%   96.54%           
=======================================
  Files          76       76           
  Lines        8820     8820           
  Branches      666      666           
=======================================
  Hits         8515     8515           
  Misses        181      181           
  Partials      124      124

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31b6eaf...3539dae. Read the comment docs.

@exarkun exarkun merged commit 1536991 into master Jan 3, 2018
@exarkun exarkun deleted the 87.get_bucket-max_keys branch January 3, 2018 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support max_keys in the S3 in-memory test double
2 participants