-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #697
Comments
Hi @rsoika I have faced similiar performance problems related to deep clones, specially when using a serialization approach, which is the most expensive way to do it. I published a research paper in a renowed academic conference (ACM/IEEE International Symposium of Empirical Software Engineering and Measurament) presenting a new way to deal with object clones in java. I called it Lazy Clone. It clones the graph of objects on demand, instead of cloning it all as deep clone strategies do. This strategy has improved the performance of an entire real world system in more than 80%. Following is the pdf of the pre-print of the research paper entitled as "Improving performance and maintainability of object cloning with lazy clones: An empirical evaluation": https://bit.ly/3aVm7lh One of the know problems of the lazy clone approach I proposed occurs when the original object is mutable. So, it is a tradeoff one has to take into consideration when adopting a lazy clone approach. I would be happy to help you with this, and maybe do a pull request. Please let me know if you consider such kind of contribution relevant to this project. Cheers, |
Hi Bruno, yes of course this sounds interesting. The reason why I came up to the deepCopy approach was initially the JPA part of the Imixs-Workflow engine. We are dealing with 'gerneric value objects' called ItemCollection. An ItemCollection contains a Java HashMap storing only serializable objects. The HashMap may contain primitives (copied by value) but also things like byte arrays or even more complex serializable structures. Normally you would expect that a JPA call Ok - this was the initial part of the problem. I don't think that the serializion of the HashMap in this special situation of long running JPA transaction is a problem. We need a full copy so I do not expect a performance boost with a lazy loading approach. And to be honest, this one is a very tricky part within the workflow engine where I don't want to make any big changes. But the reason for this issue (#697) is that the So my idea was in deed to provide some new kind of a copy method. Your LazyLoading seems what is needed here!. As far as I understand your approach you are working with reflection to detect getter methods starting with 'get' or 'is'. The ItemCollection itself has a lot of such methods. So I fear this would generate a lot of work. But maybe it is possible to just observe the getItemValue method calling the method 'hash.get(..)' method inside? The current implementation of my deepCopy looks like this:
What I did not understand so far is: would you lazyLoad the ItemCollection or just the HashMap inside the ItemCollection? Do you think a ' |
fair enough, specially if ou don't have performance problems in that situation.
that's it! that's what happened in the project i did a performance consulting job. the deep clone method was initially used to solve a small and isolated problem, but after 5 years developing that software it was spread all over the system provoking numerous performance issues.
i see. this is the most common way to make a deep clone using a serialization approach. convinient and stratighforward. unfortunatelly it is very expensive. on the article i sent you i analyzed five different implementations of clones. three deep and two lazy clone implementations . you can try another deep clone implementation that is easy to do, but according to my bechmarks it is just slightly faster that the serialization implementation. the reflection implementation. at the time i wrote the article i used the following library https://github.com/kostaskougios/cloning
it is up to the developer. with a lazy clone implementation you can choose wich object you would like to lazily clone. but once you make a lazy clone of it, all the fields of that object that are neither primitive nor immutable will be lazily cloned, just when someone try to directly access that field (for a aspect oriented implementation) or when trying to access that field through a get method (dynamic proxy implementation).
with the lazy clone implementation with dynamic proxy (which is not as fast as the aspect one) it is possible. however, with the lazy clone implementation with aspect it would be a mere deep clone due to some limitations of my firt implementation. anyway... i will try to devote some time to implement a lazy clone library with the two implementations (with dynamic proxies and aspects) since i can see there are many projects facing problems related to low maintainability and performance due to deep clones. when i have it done i come back here and share with you so we can test in a non-critical part of this project. what do you think? best, |
Thanks for these insights. Yes it sounds interesting. I have now the following vision for the new implementation of the ItemCollection Class: Only in the clone() method performs a full clone using serialization. In the critical part of the Imixs-Workflow engine I can call explicitly the clone() method to get a full deep copy, which is fine. But the constructor method:
will implement the faster lazy lading approach as you mentioned. This will will be a clear concept for developers. The most code fragments are simply using the constructor with the map parameter. So I expect a performance boost over all. But still I wonder what happens in the following scenario:
all primitives are copied by value but things like byte arrays or embedded lists are ignored.
The save method of the DocumentService EJB is currently doing the following:
I expect that here some extra work is needed, as the hashMap need to be fully loaded before I can clone it? Otherwise not all fields of my hashmap will be persisted which is not what the client is expecting. What do you think about this scenario? - btw, I invited you to join the project. |
hey @rsoika sorry for the late reply, last days were quite busy.
that's a good question! if your clone method access the fields of your lazily clone object the lazy clone would work as expected by cloning each field on demand. but since your clone implementation is serialization based we have to check how it would behave. as I told you, I'm trying to find time to work on a library to implement the lazy clone approach an make it availbale to everyone who want to use it. but it may take sometime until i have it fully implemented due to lake of time right now. so, as soon as a have it done i come back here for sure and share it with you. btw... thank you for invite me to joi the project. it is a pleasure and i hope i can contribute as soon as i have that implementation. |
The deepCopy method from the ItemCollection is the most expensive method concerning overall performance.
We can not optimize or improve this method further, but we can avoid a expensive deepCopy in many cases by replaceing
with
Candiates for this performance improvements are:
The method createByReference
should be replaced with something like:
The text was updated successfully, but these errors were encountered: