You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The iterate-dir function consumes all available heap and throws an OOME on large directory structures.
The following typescript demonstrates this problem in a couple of different ways when the function is presented with a directory containing approximately 600,000 files & sub-directories (note: embedded ANSI escape characters have been manually removed from this typescript for clarity):
Script started on Thu Jan 3 22:45:39 2013
bash-3.2$ ls -R /Users/pmonks/Development | wc -l
ls: unreadableDirectory: Permission denied
614630
bash-3.2$ lein repl
nREPL server started on port 52181
REPL-y 0.1.0-beta10
Clojure 1.4.0
Exit: Control+D or (exit) or (quit)
Commands: (user/help)
Docs: (doc function-name-here)
(find-doc "part-of-name-here")
Source: (source function-name-here)
(user/sourcery function-name-here)
Javadoc: (javadoc java-object-or-class-here)
Examples from clojuredocs.org: [clojuredocs or cdoc]
(user/clojuredocs name-here)
(user/clojuredocs "ns-here" "name-here")
fs-scan.core=> (require '[fs.core :as fs])
nil
fs-scan.core=> (defn walker [root dirs files] ())
#'fs-scan.core/walker
fs-scan.core=> (fs/walk walker "/Users/pmonks/Development")�
OutOfMemoryError Java heap space java.util.Arrays.copyOf (Arrays.java:2882)
fs-scan.core=> (fs/iterate-dir "/Users/pmonks/Development")�
OutOfMemoryError Java heap space java.util.Arrays.copyOf (Arrays.java:2882)
fs-scan.core=> (do (fs/iterate-dir "/Users/pmonks/Development")� ())
OutOfMemoryError Java heap space java.util.Arrays.copyOf (Arrays.java:2882)
fs-scan.core=> exit
Bye for now!
bash-3.2$ exit
exit
Script done on Thu Jan 3 22:53:42 2013
I believe this is occurring because iterate-dir is not lazy (despite the doc comment), and is eagerly building the entire sequence of pathnames in memory.
The text was updated successfully, but these errors were encountered:
For my use case, this issue appears when using the walk function. Basically I want to be able to walk very large directory structures (10s to 100s of millions of files, transitively), processing as I go.
I see. The problem is that the zipper used under the hood holds the whole tree in memory. I'll get a fix in asap. Should just be a tree-seq (I didn't write this code. I never write code that blows the heap, you see ;)).
The iterate-dir function consumes all available heap and throws an OOME on large directory structures.
The following typescript demonstrates this problem in a couple of different ways when the function is presented with a directory containing approximately 600,000 files & sub-directories (note: embedded ANSI escape characters have been manually removed from this typescript for clarity):
I believe this is occurring because iterate-dir is not lazy (despite the doc comment), and is eagerly building the entire sequence of pathnames in memory.
The text was updated successfully, but these errors were encountered: