Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise dumping to reduce unnecessary overhead #1649

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

dsimidzija
Copy link

NB: Please take this MR more as a "starting a discussion" than "production-ready" code, because I hacked a lot of things.

When dumping many objects, marshmallow is calling the same field methods
over and over again, which return the same values. Parts of this process
can be called only once per dump, which reduces python method call
overhead significantly.

Field.get_serializer returns the optimized serializer for the current
dump operation, avoiding the expensive lookups for properties which will
not change during a single dump (such as data_key, default, etc)

Also, the default Schema.get_attribute is also not used because all it
does is calling utils._get_value_for_key(s).

Benchmarks show around 30-35% improvement, which is quite significant even for this hacky patch:

T1: python benchmark.py
  Before: 395.60 usec/dump
  After:  261.04 usec/dump
T2: python benchmark.py --object-count 1000
  Before: 22508.80 usec/dump
  After:  14610.63 usec/dump
T3: python benchmark.py --iterations=5 --repeat=5 --object-count 20000
  Before: 442295.61 usec/dump
  After:  288202.98 usec/dump
T3: python benchmark.py --iterations=10 --repeat=10 --object-count 10000
  Before: 220163.94 usec/dump
  After:  142475.76 usec/dump

My motivation here is that marshmallow is excellent when it comes to schema validation, but according to the benchmarks, there is a lot of overhead in there. The question is, is there a better way of improving serialization performance without sacrificing all the good things about marshmallow?

@deckar01
Copy link
Member

This lines up pretty well with the discussion in #805. I think the main difference is that we also proposed caching logic that depends on the object type. I think the scope of this PR would be a good incremental improvement.

@dsimidzija
Copy link
Author

I didn't even know about #805, good to know! My ideas were something similar: figure out a way to cache values for certain types at least. For example, if I know that a specific field is a string property and it is 100% there, there should be no reason to have complicated lookups or enforcing types. I didn't have time to delve deeper into the various Field subclasses, but line_profiler implies a lot of repetition there.

As I'm typing this, I wonder if it would be possible to do something like lima is doing, and compile a single function which dumps the entire object, where each Field subclass could in theory produce the "most optimized" serializer for itself.

@dsimidzija dsimidzija force-pushed the dump-performance-tweaks branch from 4f04fbe to 595b12b Compare September 25, 2020 15:45
@dsimidzija dsimidzija force-pushed the dump-performance-tweaks branch from 595b12b to ee835bf Compare March 7, 2021 21:07
@dsimidzija
Copy link
Author

I don't know if this is of any interest, but with some minimal changes, marshmallow can be cythonized with setuptools-cythonize and together with this MR, the performance is around ~45% better!

@lafrech
Copy link
Member

lafrech commented Mar 16, 2021

I'm not familiar with cythonize. Does cythonization itself - without this PR - bring a significant improvement?

Is there anything we could do to make it available more easily? Like, could/should we distribute binary packages? Would it be useful to many users or is it a niche?

@dsimidzija
Copy link
Author

I tested it without this MR a long time ago, so I don't have the numbers, but IIRC it did bring around 10% of speed on its own. But I guess that should be examined in more detail.

I'm honestly not sure how niche it is, I started looking into it because I ran into these performance problems when dumping large(ish) datasets with marshmallow, but didn't want to give up dynamic schemas & validation that it provides. I haven't had the time yet, but I was hoping to look into speeding up individual fields as well, I feel like there is a lot of overhead there which should be easy to eliminate without breaking anything.

@dsimidzija dsimidzija force-pushed the dump-performance-tweaks branch from ee835bf to df37ea1 Compare November 21, 2021 14:45
When dumping many objects, marshmallow is calling the same field methods
over and over again, which return the same values. Parts of this process
can be called only once per dump, which reduces python method call
overhead significantly.

`Field.get_serializer` returns the optimized serializer for the current
dump operation, avoiding the expensive lookups for properties which will
not change during a single dump (such as `data_key`, `default`, etc)

Also, the default `Schema.get_attribute` is also not used because all it
does is calling `utils._get_value_for_key(s)`.
@dsimidzija dsimidzija force-pushed the dump-performance-tweaks branch from df37ea1 to 622d8d0 Compare November 21, 2021 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants