-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max_keys and marker support for get_bucket #88
Conversation
it needs more time to really talk to aws for all those objects
marker doesn't make any sense without this also it's what AWS S3 does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving txaws.testing
! The substance of this PR looks fine. I believe the news fragment points to the wrong class, and there's a typo in a comment. Please fix these issues and consider the other comments in this review before merging.
@@ -94,23 +95,45 @@ def delete_bucket(self, bucket): | |||
return succeed(None) | |||
|
|||
@_rate_limited | |||
def get_bucket(self, bucket, prefix=None): | |||
def get_bucket(self, bucket, marker=None, max_keys=None, prefix=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not set max_keys
to 1000 here? That would make it discoverable via introspection and documentation if we ever set up a documentation generation system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I much prefer the idiom of "None means the default". That way, if you ever want the default behavior and you find it more convenient to pass the argument than not, you can just pass None. You don't have to duplicate the default value (and then diverge when the default changes).
This is pretty much just because of function call syntax. This is unnecessary repetition:
if have_no_special_value:
foo()
else:
foo(something=special_value)
as compared to:
foo(something=special_value_which_might_be_none)
I agree that discoverability is good but I don't think it wins out in this case.
content | ||
for content | ||
in sorted(listing.contents, key=lambda item: item.key) | ||
if content.key.startswith(prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix
is bytes
while keys_after
is unicode
, but both are compared against content.key
. The tests below seem to indicate content.key
should be unicode
, but whatever the type is, it should be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the literals in get_bucket
to bytes, at least, which matches the documented type of the parameters (documented on the real implementation of get_bucket
). Filed #89 to sort out (at least part of) this mess for real.
@@ -101,6 +105,32 @@ def test_objects(self): | |||
"Expected to not find deleted objects in listing {}".format(objects), | |||
) | |||
|
|||
def test_get_bucket_object_order(self): | |||
""" | |||
The objects returned by C{get_bucket} are sorted lexicographically by their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be L{txaws.testing.s3._MemoryS3Client.get_bucket}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is run against the in-memory implementation and the AWS implementation. So it's really verifying they both sort like this. In the latter case, it is totally AWS that does the sorting, not us. But I guess I like the docstring as-is where it's not referring to just one implementation but stating something about the interface.
txaws/testing/s3_tests.py
Outdated
The objects returned by C{get_bucket} are sorted lexicographically by their | ||
key. | ||
""" | ||
bucket_name = str(uuid4()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bucket names should be unicode
, though apparently they can be UTF-8 encoded bytes
. I use txaws
' test suite as a source of documentation and usage patterns, so while other tests use str
for bucket names, it would be nice if they consistently used unicode
instead.
|
||
def test_get_bucket_max_keys(self): | ||
""" | ||
C{max_keys} can be passed to C{get_bucket} to limit the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be L{txaws.testing.s3._MemoryS3Client.get_bucket}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I think.
txaws/testing/s3_tests.py
Outdated
not specified. | ||
""" | ||
max_keys = 1000 | ||
bucket_name = str(uuid4()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about unicode
bucket names as above.
txaws/testing/s3_tests.py
Outdated
def created_bucket(ignored): | ||
# Put a bunch ofobjects. The default limit is 1000. | ||
work = ( | ||
client.put_object(bucket_name, unicode(i).encode("ascii")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about the type of S3 keys as above.
txaws/testing/s3_tests.py
Outdated
client = get_client(self) | ||
d = client.create_bucket(bucket_name) | ||
def created_bucket(ignored): | ||
# Put a bunch ofobjects. The default limit is 1000. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put a bunch of objects.
for i in range(max_keys + 3) | ||
) | ||
return gatherResults([ | ||
cooperate(work).whenDone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting the work across 3 CooperativeTask
s is consistently slower than gatherResults(list(work))
in my tests. YMMV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiting the parallelism to 3x (instead of 1000x) drastically helped on my network. Perhaps in part because it's slower? But my thinking was that starting 3 TCP connections instead of 1000 would probably make life easier on somebody's router. I could probably go to 10x or 15x without much harm and maybe that would recover some of the performance loss as well. Of course, it will always be slower to limit the parallelism than to not limit it when the code is run on a sufficiently fast/capable network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising it past 5x causes it to begin failing on my network. Gonna just leave it at 3. :/
txaws/newsfragments/87.feature
Outdated
@@ -0,0 +1 @@ | |||
txaws.s3.client.S3Client.get_bucket now accepts ``max_keys`` and ``marker`` parameters for controlling paging through objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3Client.get_bucket
has accepted marker
s and max_key
s for a while. This should be txaws.testing.s3.S3Client
.
Codecov Report
@@ Coverage Diff @@
## master #88 +/- ##
=======================================
Coverage 96.54% 96.54%
=======================================
Files 76 76
Lines 8820 8820
Branches 666 666
=======================================
Hits 8515 8515
Misses 181 181
Partials 124 124 Continue to review full report at Codecov.
|
Fixes #87
Note the issue only mentions max_keys but fixing is_truncated is pretty important once max_keys is supported so I did that as well. And then I noticed marker was still unsupported so I couldn't resist implementing that as well.