-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
large inputs cause hang and never finish #5
Comments
Are you getting any output at all, for example an error or exception? Does the program exit successfully or does it hang? What’s your Python version? I haven’t tested with anything lower than 3.5 How big are your files? The required memory will be much more than the combined file sizes, so test with files that are a few kilobytes maximum. I haven’t tested with windows at all, and it may be that some implementation details have been missed. I’ll get a VM on Monday to test this, but if you could provide he above info it would help me :) |
Hi, here is more detail as follows.
N E O N S E N S E positional arguments: optional arguments:
|
I set up a virtual machine and tested this in Windows 10 with python 3.7 and everything seems to work fine. Did you install multidiff by running |
ahh... When I create two test bin file, and the file size is about 4KB, It done. |
The limit is constrained by the available ram on your computer so I’m not sure but I’d say it’s in the tens or hundreds of megabytes. The whole file will be printed too so a large file might be a little unpractical to work with. What kind of data are you looking at and what are you trying to find? Small differences in large files? |
Yes, The kind of data I trying to find is small differences in large file. |
Yes, that's due to the python difflib needing to always diff the whole sequence. The difflib documentation says:
So in the worst case scenario 4M takes 4^3=128 times as long as 1M, which is clearly too long for your use. I also assume you wouldn't really want to see all the matching parts, but only the differing ones? I think making the tool faster would need some work on the underlying diffing algorithms, which is something I'm unlikely to have time for in the near future. |
I modified the multidiff library to only show the addresses where the bytes have changed. The output looks something like this - https://pastebin.com/csT3dpRK I tested it on a 2MB file and it took me approxmately 10 minutes. |
Great, if you want me to merge those changes, then just make a pull request but add a flag to the commandline for the feature. I think the hang issue is related to calculating the diff rather than outputting the result so I wont be closing this issue :) |
I have install multidiff and I execute the command with window10
“multidiff test1.bin test2.bin”
it doesn't work...
The text was updated successfully, but these errors were encountered: