Make the Sequence type functional #26

bordoley · 2017-04-12T23:34:38Z

The definition of Sequence requires the use of side effects and provides no way to abandon iteration. At the surface, Sequence appears to provide similar functionality to Immutable.re's Iterable type, with less flexibility. I would suggest either changing to Sequence to implement an ocaml compatible monadic sequence, or an api similar to Immutable.re's Iterable, which is implemented using a reduceWhile function, taking advantage of a GADT to avoid allocations for empty Iterables. See: https://github.com/facebookincubator/immutable-re/blob/master/src/core/Iterable.re

bordoley · 2017-04-12T23:35:50Z

Another option would be to implement Immutable.re's Iterator module instead.

c-cube · 2017-04-20T21:47:50Z

Just curious, why is Sequence less flexible? Side effects are not exposed to the user anyway…

ghost · 2017-04-20T23:01:38Z

@c-cube on a not so related note, could you enlighten me on some use cases of iterators other than say populating a binary tree with a list without knowing their internal structure? I find most the time I can just use List.map etc to iterate over a some structure? Thanks.

bordoley · 2017-04-20T23:10:51Z

@c-cube: I meant that Immutable.re's Iterable module is more flexible and to a degree more pure (even though some operators use side effects internally) as it is implemented using a reduction function that supports early termination via a while check. This allow implementation of more operators such as take(). The monadic iterator's main benefits are that it supports zipping and eager seeking (Immutable.re's Sequence). The reduce/while approach is faster, since most operators can be implemented allocation free.

@dorafmon: In general, whenever chaining operators you should prefer the iterator/sequence approach. In the stdlib, functions such as List.map, etc. constantly allocate memory, since they are eager. List.map is actually particularly bad because the default implementation is not even tail recursive.

c-cube · 2017-04-21T06:57:29Z

@dorafmon I use them a lot for various purpose (sequence is very very low overhead):

in algorithmic code, to iterate on nested structures, thanks to flat_map (or >>=), typically arrays of trees of trees, and then filter/map/fold on labels of these trees
for printing any container using only one printing combinator on sequences
to convert between structures, gather the values in a hashtable, etc.
for basic IO, it's useful to iterate on the lines of a file or the files in a directory in a uniform way (and without reading the whole file beforehand)

the debate as to which iterator type is the best has been going on for a while, without clear agreement. I'll just say that Sequence is very efficient for most uses (as can be shown in benchmarks, especially with flambda), that it is structural (can define it in several libraries in a compatible way, unlike a fold approach that requires a nominal record for the quantification) and that it works with stdlib structures, such as Queue.t or Hashtbl.t.

For reference, slides of a talk I gave on sequence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the Sequence type functional #26

Make the Sequence type functional #26

bordoley commented Apr 12, 2017

bordoley commented Apr 12, 2017

c-cube commented Apr 20, 2017

ghost commented Apr 20, 2017

bordoley commented Apr 20, 2017

c-cube commented Apr 21, 2017

Make the Sequence type functional #26

Make the Sequence type functional #26

Comments

bordoley commented Apr 12, 2017

bordoley commented Apr 12, 2017

c-cube commented Apr 20, 2017

ghost commented Apr 20, 2017

bordoley commented Apr 20, 2017

c-cube commented Apr 21, 2017