-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use marshal for save/load operators to improve performance #124
Conversation
allows for handling of even numbers of momenta modes in plane wave code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to not use marshal over HDF5 because
- It is Python specific
- The format is not guaranteed to be identical for different python versions
- It is not secure as I could put malicious data into it
@damiansteiger the first two points are true but the second isn't, see e.g. https://stackoverflow.com/questions/26931919/marshal-unserialization-not-secure - marshal doesn't allow you to insert arbitrary code (as e.g. pickle / eval would) and there are no known exploits for it. The main reasons we want to change from h5py to marshal are speed (factor ~5 faster loading, and ~10+ saving), but more significantly memory usage. In the tests I've done with saving/loading larger operators (with >10 M terms), the conversion to HDF5 blows up in memory, getting as high as ~30 GB for a 2 GB FermionOperator. marshal for the same file only uses ~8 GB by comparison (though I'm still interested in lighter solutions if you know of any!). This means that loading the kind of FermionOperators that we'd be interested in saving in the first place is almost impossible with HDF5. |
@@ -223,3 +217,19 @@ def get_file_path(file_name, data_directory): | |||
file_path = data_directory + '/' + file_name | |||
|
|||
return file_path | |||
|
|||
|
|||
def numpy_scalar(number): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we were to cast to complex all float types to complex instead? This makes me wonder why we don't just make the coefficients complex when they're defined - is there a reason we need things like numpy.complex128? @damiansteiger @babbush @jarrodmcc
@damiansteiger: while I appreciate your concerns, right now performance is significantly more important for us. While I see that it might be problematic for people to try to share data between different versions of python or even another language, that is much less of a concern than the fact that we cannot run calculations we need to run without this pull request, or something similar which dramatically brings down memory usage. Right now the security concerns appear to be hypothetical, as Ian points out. So before we make the new release Sunday night, we'd like to have this PR accepted. |
for k, v in operator.terms.items()], | ||
ensure_ascii=False).encode('utf8')) | ||
tm = operator.terms | ||
f = open(file_path, 'wb') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use context manager here (with open(file_path, 'wb') as f:
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ensure_ascii=False).encode('utf8')) | ||
tm = operator.terms | ||
f = open(file_path, 'wb') | ||
marshal.dump((operator_type, {k: numpy_scalar(tm[k]) for k in tm}), f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this a little more, I'm not sure numpy_scalar will do anything fundamentally different from casting to complex directly - do you have any thoughts on this @Spaceenter ? Using complex instead is a little cleaner (no extra function needed) and also a bit faster.
By doing that as well as using itertools' izip and imap, this line can be made ~25% faster (dict(izip(iterkeys(tm), imap(complex, itervalues(tm))))
), though it's by no means a bottleneck :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the coefficients of both FermionOperator and QubitOperator should always be of native Python complex type when the operators are originally constructed, so that we don't need to apply the transformations/castings at all here. Is it possible to make that change in FermionOperator and QubitOperator?
If not, I agree that casting to complex directly here is cleaner.
@idk3 @babbush @jarrodmcc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we can trust that it will always be initialized this way. Even if it is, inevitably people will write things like:
my_fermion_operator *= my_numpy_array[i]
which may cause the coefficient to become a numpy type. So I think we are going to need to cast directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@Spaceenter The current failure is due to my mistake - izip isn't available in Python3 at all it seems. I think map and zip from future.builtins or future_builtins should do it. edit: |
@@ -153,8 +153,8 @@ def save_operator(operator, file_name=None, data_directory=None): | |||
|
|||
tm = operator.terms | |||
with open(file_path, 'wb') as f: | |||
marshal.dump((operator_type, dict(izip(tm.iterkeys(), | |||
imap(complex, tm.itervalues())))), f) | |||
marshal.dump((operator_type, dict(zip(tm.keys(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the dictionary parts you should still be able to use iterkeys and itervalues from future.utils! i.e. iterkeys(tm)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at this:
https://stackoverflow.com/questions/10189273/for-a-python-dictionary-does-iterkeys-offer-any-advantages-over-viewkeys
Is it a style preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure actually - I actually thought that Python3's d.keys() was 2's d.iterkeys(). In fact I didn't know about viewkeys at all. It looks like both should work, but I know that future.utils.iterkeys passes tests for both 2 and 3 - it would be interesting to see if there's anything different with viewkeys, though I don't think we need any of its special properties like dynamic view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like tests for both 2 and 3 passed - should we go with this? Or do you want a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine as is! If it's significantly slower I'll make another PR after this one gets through, but I doubt there'll be any issues.
@Spaceenter @idk3 Let me explain in more detail why I have raised objections about using marshal:
I am not yet convinced the performance argument outweighs any of these three points (maybe a little bit point 3 but for sure not 1 and 2). Did you try the following:
@babbush and @idk3 For the time being, you could just copy this code (which uses marshal and solves your immediate performance problem) and solve your problem. Is there a specific reason why this util function has to be inside FermiLib? |
I understand and respect your objections. However, for us to keep developing FermiLib we need to be able to sometimes strike a compromise between functionality today and purity of engineering practices. Currently, FermiLib is implemented entirely in python and marshall is passing for all tested versions of python. While theoretically possible, there are no known security exploits in marshall. Simply put, that is good enough for us. The solution is in some ways suboptimal so it makes sense to open an issue and eventually work towards a better solution. We can document what is going on with examples and wait for this hypothetical person who is browsing our repository to suggest something better. But currently the memory spikes associated with the current solution are completely unacceptable. We need a new release of FermiLib to happen Sunday night (as we discussed) and to include the marshall solution. Otherwise, we cannot check the new version into Google3 and that will cause us unacceptable delays. Considering that almost all FermiLib pull requests currently come from people involved in this project, I think this is a completely fair reason. Our projects are largely reasonable for moving FermiLib forward. So we need to be practical. This solution does work for everyone who actually uses the code today. Please accept this pull request in time for the new release. If you want to discuss further, please call me. |
This is a open source project, therefore technical discussions should happen openly and not behind closed doors (e.g. phone calls) so that everyone using FermiLib can see why certain design decisions have been made. Here is a compromise: |
Thanks for the discussions. I actually have another idea to make this work without making it Python specific. Inside Google, almost all large engineering project use protobuf as a no-brain choice to represent data model (complex or not), because it is designed to be working across different languages, and have much better serializations than things like json. So instead of using native Python dictionary to represent FermionOperator and QubitOperator (and a few other objects like InteractionRDM, MoleculeOperator, etc.), we define a protobuf message for each of them. And then write the serialized protobuf binary to file. https://developers.google.com/protocol-buffers/docs/pythontutorial#parsing-and-serialization This is obviously more complex than marshal, and we'd need some time to develop and test. |
@damiansteiger I agree we should try to keep discussions open. I also appreciate your point about it being more difficult to get people to fix issues when they are "mostly working" already and not badly in need of repair (as is the case now). I will not personally promise to fix this myself. The first reason is because I am really not the right person to do this - I am not aware of any acceptable solutions and there are other people on this project who would be suitable (such as yourself or @Spaceenter). The second reason is because this is not a critical issue for me whereas there are many other features that I'd like to implement. But I will certainly encourage others to try to come up with a better solution before we move out of alpha. @Spaceenter, you just suggested moving to protocol buffers, which sounds like a reasonable fix to me. Is this something you could commit to trying to do in the next few weeks? Hopefully this is enough for you to remove the block. I assume that if @Spaceenter agrees to help fix this then that will be good enough for you? |
Yes, I'll start being a bit more free later next week, I'll then try to send a PR with protobuf and work with @idk3 to test the performance. In my opinion other people who will use projectq/fermilib in the future will also benefit from protobuf as they might use Java, Go, JS etc. to access the operators, and having protobuf should be much easier. Anyways, we could discuss when I actually have the PR ready. |
Great, thank you @Spaceenter! I agree that moving to protocol buffers makes a lot of sense for the long run. I suppose this should be enough for @damiansteiger to merge this PR. |
No description provided.