-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PathCollisionValidator is not very memory efficient #49
Comments
Thank you for the feedback @cranphin. I'm glad to hear that you find this project useful. I understood your design from the description and I think your design for |
@cranphin: Here's an update on my plan. For FileContentReplace I'm using Google's Jimfs to keep track of SVN files in memory. I'm planning to adapt that to make Might need some hacks to deal with the revision though. |
Awesome :) So I guess if I want that, I will have to skip svndumpfilter entirely, and create the filtered file myself. Possibly by starting at the revesion/path I want the history for, then backtracking all sources/copies back into history, keeping a list, and then creating a dump file from that list in forward order. Not that easy XD Ran out of time for now anyways, mayby somday based on your code ;) Cheers :) |
Oh, and I like the use of Jimfs, don't know anything about it, but sounds like an interesting way to do it ;) |
Yes, that's the hard part, the fact that a file's path is dependent on the revision that it exists in. That forces us to have to track paths through revisions. I was able to use Jimfs to get basic test cases to work, however I ran into a scenario similar to the one you speak of: A file existed at some point in the past, but doesn't exist now. Since Jimfs reflects the paths of the previous revision only, it is not a sufficient solution. My current thinking is to create a Conditionally Linked Tree data structure. The traversal of the tree will be controlled by a condition (in our case whether that node existed at a specific revision). The tree will be maintained across revisions, and tree nodes will have a range of revisions for which they exist in the tree. I'm sure a Conditionally Linked Tree is not a novel idea since it's just another way of implementing custom tree traversal, but I have not been able find to an implementation that puts the conditions in the tree nodes themselves. |
Well, I'm mostly where I want to be with my app :) For marking and storing all changes needed to keep the history for a certain path at a certain revision, I just keep a list of each revision and each change, and then walk backwards through both the changes and revisions. I then keep a list of 'roots' I'm interested in (each only contains a revision and path), each copy adds a new root (or multiple if some roots fall underneath the copied to path), each add deletes a root (since it didn't exist in the repo before the add), and each change matching a root is tagged to be included in the output. The hard parts is in the details, I actually keep two kinds of roots, ones which we want the subnodes of, and ones that are only parent paths (so the parent paths are also included :) ). I'll add a snippet of just the Analyzer, if you want to do something like that, it'll help a lot ;) |
PathCollisionValidator will choke quickly on a large repository, because revisionSnapshots stores a map with every full path in the repository for each revision. With many tags this becomes quite a lot quite fast, and Java runs out of memory :)
I solved that same issue (I'm doing something very similar to help me refactor a giant repository, but for work so I can't share code ;) ) by instead using a tree structure, with each node storing it's 'file' name and it's revision. The big trick is that for each new revision only the root node is replaced, the sub nodes are the nodes from the previous repository. On each add/delete/replace/modify the nodes in the path are also replaced, but no others. When adding a copy, again only the top node is new, and links to the nodes from the copied revision.
This creates a structure which holds every revision completely, in a easily traverse able format, so you can perform the checks, but which is far more memory use friendly.
Hope this makes sense, hope this helps ;)
The text was updated successfully, but these errors were encountered: