-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use os.walk for recursing #48
base: master
Are you sure you want to change the base?
Conversation
Benchmark results with: # mkdirs.py
from os import mkdir
from os.path import join
def mkdirs(parent, depth=0):
if depth == 5:
for f in range(10):
with open(join(parent, str(f)), 'w'):
pass
return
depth += 1
for d in range(10):
newdir = join(parent, str(d))
mkdir(newdir)
mkdirs(newdir, depth)
if __name__ == '__main__':
mkdirs('/storage/test') That's 111111 directories (and 1 million files). Testing with: # test.py
from inotify.adapters import InotifyTree
if __name__ == '__main__':
i = InotifyTree('/storage/test') This is happening on a rotational drive. The benchmark (I did several runs and chose the best result):
So that's about 1/4th of the time. Just out of curiosity, I tried old code for comparison (without flushing cache):
So, a negligible difference in the os.listdir() loop vs os.walk(), when os.scandir() is not involved. As i mentioned, this was on an HDD. On an SSD the results are:
Not so dramatic, but still a serious improvement. |
@@ -11,12 +11,9 @@ def temp_path(): | |||
path = tempfile.mkdtemp() | |||
|
|||
original_wd = os.getcwd() | |||
os.chdir(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave these in, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still relevant? There is some explanation here: #48 (comment)
@@ -166,7 +166,10 @@ def test__cycle(self): | |||
|
|||
i = inotify.adapters.InotifyTree(path) | |||
|
|||
with open('seen_new_file1', 'w'): | |||
watches = i._i._Inotify__watches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give me an overview on why this file has to change (aside from fully-qualifying the paths)?
@dsoprea can't You just merge this one and roll back the unwanted/unneeded change in inotify/test_support.py as @xlotlu doesn't seem to respond.. Then pending #31 and #30 could be properly reimplemented and all pending pull-requests be merged. And I would have one more reason to switch to Python >3.4... The question regarding the changes in tests/test_inotify.py can be answered shortly. They are needed because of access pattern changes using os.walk. Every directory is scanned by os.walk after it has been added to inotify because not intermediate list is used. (Which is the proper way. First it's shorter and probably all in all more efficient code, second there will not be missing directories if they are added between listdir and add_watch with old method..) |
@xlotlu isn't responding because it's been a really long time since my commit got any attention, and the specifics are vague, and I ended up not using this project in production. But, IIRC, @Elias481's explanation above is precisely why the changes are needed. Also, the old tests made assumptions about the access order which don't hold true with I'm not sure though why I decided to remove that
or maybe I thought the code smells and could trigger strange behaviour, or all of the above. |
By the way, in the meantime I discovered the reason why my benchmarks under #48 (comment) didn't yield different results with / without dentry cache: |
Use os.walk() for recursing instead of the custom os.listdir() loop, and adapted tests.
This should improve efficiency in python >= 3.5, which uses the new os.scandir() under the hood.
Related to #45.