max_keys and marker support for get_bucket #88

exarkun · 2018-01-02T19:58:34Z

Fixes #87

Note the issue only mentions max_keys but fixing is_truncated is pretty important once max_keys is supported so I did that as well. And then I noticed marker was still unsupported so I couldn't resist implementing that as well.

it needs more time to really talk to aws for all those objects

marker doesn't make any sense without this also it's what AWS S3 does

markrwilliams

Thanks for improving txaws.testing! The substance of this PR looks fine. I believe the news fragment points to the wrong class, and there's a typo in a comment. Please fix these issues and consider the other comments in this review before merging.

markrwilliams · 2018-01-02T20:36:54Z

txaws/testing/s3.py

@@ -94,23 +95,45 @@ def delete_bucket(self, bucket):
        return succeed(None)

    @_rate_limited
-    def get_bucket(self, bucket, prefix=None):
+    def get_bucket(self, bucket, marker=None, max_keys=None, prefix=None):


Why not set max_keys to 1000 here? That would make it discoverable via introspection and documentation if we ever set up a documentation generation system.

I much prefer the idiom of "None means the default". That way, if you ever want the default behavior and you find it more convenient to pass the argument than not, you can just pass None. You don't have to duplicate the default value (and then diverge when the default changes).

This is pretty much just because of function call syntax. This is unnecessary repetition:

if have_no_special_value: foo() else: foo(something=special_value)

as compared to:

foo(something=special_value_which_might_be_none)

I agree that discoverability is good but I don't think it wins out in this case.

markrwilliams · 2018-01-02T20:42:41Z

txaws/testing/s3.py

+            content
+            for content
+            in sorted(listing.contents, key=lambda item: item.key)
+            if content.key.startswith(prefix)


prefix is bytes while keys_after is unicode, but both are compared against content.key. The tests below seem to indicate content.key should be unicode, but whatever the type is, it should be consistent.

Changed the literals in get_bucket to bytes, at least, which matches the documented type of the parameters (documented on the real implementation of get_bucket). Filed #89 to sort out (at least part of) this mess for real.

markrwilliams · 2018-01-02T20:48:27Z

txaws/testing/s3_tests.py

@@ -101,6 +105,32 @@ def test_objects(self):
                "Expected to not find deleted objects in listing {}".format(objects),
            )

+        def test_get_bucket_object_order(self):
+            """
+            The objects returned by C{get_bucket} are sorted lexicographically by their


Should this be L{txaws.testing.s3._MemoryS3Client.get_bucket}?

The test is run against the in-memory implementation and the AWS implementation. So it's really verifying they both sort like this. In the latter case, it is totally AWS that does the sorting, not us. But I guess I like the docstring as-is where it's not referring to just one implementation but stating something about the interface.

markrwilliams · 2018-01-02T20:55:11Z

txaws/testing/s3_tests.py

+            The objects returned by C{get_bucket} are sorted lexicographically by their
+            key.
+            """
+            bucket_name = str(uuid4())


Bucket names should be unicode, though apparently they can be UTF-8 encoded bytes. I use txaws' test suite as a source of documentation and usage patterns, so while other tests use str for bucket names, it would be nice if they consistently used unicode instead.

markrwilliams · 2018-01-02T20:56:24Z

txaws/testing/s3_tests.py

+
+        def test_get_bucket_max_keys(self):
+            """
+            C{max_keys} can be passed to C{get_bucket} to limit the number of


Should this be L{txaws.testing.s3._MemoryS3Client.get_bucket}?

As above, I think.

markrwilliams · 2018-01-02T21:05:45Z

txaws/testing/s3_tests.py

+            not specified.
+            """
+            max_keys = 1000
+            bucket_name = str(uuid4())


Same question about unicode bucket names as above.

markrwilliams · 2018-01-02T21:06:02Z

txaws/testing/s3_tests.py

+            def created_bucket(ignored):
+                # Put a bunch ofobjects.  The default limit is 1000.
+                work = (
+                    client.put_object(bucket_name, unicode(i).encode("ascii"))


Same question about the type of S3 keys as above.

markrwilliams · 2018-01-02T21:06:41Z

txaws/testing/s3_tests.py

+            client = get_client(self)
+            d = client.create_bucket(bucket_name)
+            def created_bucket(ignored):
+                # Put a bunch ofobjects.  The default limit is 1000.


Put a bunch of objects.

markrwilliams · 2018-01-02T21:24:34Z

txaws/testing/s3_tests.py

+                    for i in range(max_keys + 3)
+                )
+                return gatherResults([
+                    cooperate(work).whenDone(),


Splitting the work across 3 CooperativeTasks is consistently slower than gatherResults(list(work)) in my tests. YMMV.

Limiting the parallelism to 3x (instead of 1000x) drastically helped on my network. Perhaps in part because it's slower? But my thinking was that starting 3 TCP connections instead of 1000 would probably make life easier on somebody's router. I could probably go to 10x or 15x without much harm and maybe that would recover some of the performance loss as well. Of course, it will always be slower to limit the parallelism than to not limit it when the code is run on a sufficiently fast/capable network.

Raising it past 5x causes it to begin failing on my network. Gonna just leave it at 3. :/

markrwilliams · 2018-01-02T21:25:17Z

txaws/newsfragments/87.feature

@@ -0,0 +1 @@
+txaws.s3.client.S3Client.get_bucket now accepts ``max_keys`` and ``marker`` parameters for controlling paging through objects.


S3Client.get_bucket has accepted markers and max_keys for a while. This should be txaws.testing.s3.S3Client.

The implementation was fine before.

codecov · 2018-01-03T15:23:15Z

Codecov Report

Merging #88 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #88   +/-   ##
=======================================
  Coverage   96.54%   96.54%           
=======================================
  Files          76       76           
  Lines        8820     8820           
  Branches      666      666           
=======================================
  Hits         8515     8515           
  Misses        181      181           
  Partials      124      124

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31b6eaf...3539dae. Read the comment docs.

exarkun added 8 commits January 2, 2018 13:22

max_keys support in the in-memory S3 implementation

986048b

remove unused import

767130b

make it more likely the real impl will pass

89644a2

it needs more time to really talk to aws for all those objects

Get is_truncated set correctly

f2a7eef

Get marker set properly

b075ca5

Produce objects in sorted order

1a3c5bf

marker doesn't make any sense without this also it's what AWS S3 does

don't really skip this

751acb0

news fragment

3ecd561

markrwilliams approved these changes Jan 2, 2018

View reviewed changes

exarkun added 7 commits January 3, 2018 07:56

It's the test helpers that are better now.

4ca17e4

The implementation was fine before.

Consistently consider marker to be bytes

cedee4c

unicode bucket names

bb8b377

unicode object names

3cfd574

document the parameters

9c563dd

avoid ambiguity

71062ba

Fix typo

3539dae

exarkun merged commit 1536991 into master Jan 3, 2018

exarkun deleted the 87.get_bucket-max_keys branch January 3, 2018 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_keys and marker support for get_bucket #88

max_keys and marker support for get_bucket #88

exarkun commented Jan 2, 2018

markrwilliams left a comment

markrwilliams Jan 2, 2018

exarkun Jan 3, 2018

markrwilliams Jan 2, 2018

exarkun Jan 3, 2018

markrwilliams Jan 2, 2018

exarkun Jan 3, 2018

markrwilliams Jan 2, 2018

markrwilliams Jan 2, 2018

exarkun Jan 3, 2018

markrwilliams Jan 2, 2018

markrwilliams Jan 2, 2018

markrwilliams Jan 2, 2018

markrwilliams Jan 2, 2018

exarkun Jan 3, 2018

exarkun Jan 3, 2018

markrwilliams Jan 2, 2018

codecov bot commented Jan 3, 2018

		@@ -0,0 +1 @@
		txaws.s3.client.S3Client.get_bucket now accepts ``max_keys`` and ``marker`` parameters for controlling paging through objects.

max_keys and marker support for get_bucket #88

max_keys and marker support for get_bucket #88

Conversation

exarkun commented Jan 2, 2018

markrwilliams left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 3, 2018

Codecov Report