Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHAMP algorithm is current state of the art for immutable/persistent collections #6

Closed
mikehearn opened this issue Jul 21, 2016 · 15 comments

Comments

@mikehearn
Copy link

I have nothing against using the PCollections library but the CHAMP algorithm is the current state of the art for JVM based immutable collection implementations:

https://blog.acolyer.org/2015/11/27/hamt/

You can find an implementation here:

https://github.com/usethesource/capsule

The footprint improvements are quite impressive. It'd be a shame if Kotlin shipped a library that wasn't using the current best research into the field, especially as an implementation already exists. It'd be a nice competitive advantage over other languages.

@hastebrot
Copy link

This would be very great.

Just for reference: A while ago I found this older implementation of a HAMT
in Kotlin: https://github.com/KenBarclay/KData I'm not quite sure if this
uses the CHAMP algorithm.

On Jul 21, 2016 12:36 PM, "Mike Hearn" [email protected] wrote:

I have nothing against using the PCollections library but the CHAMP
algorithm is the current state of the art for JVM based immutable
collection implementations:

https://blog.acolyer.org/2015/11/27/hamt/

You can find an implementation here:

https://github.com/usethesource/capsule

The footprint improvements are quite impressive. It'd be a shame if Kotlin
shipped a library that wasn't using the current best research into the
field, especially as an implementation already exists. It'd be a nice
competitive advantage over other languages.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAeSf6vaCm6aJxeP6QnTF94Eui-8dVBNks5qX0uvgaJpZM4JRqbW
.

@ilya-g
Copy link
Member

ilya-g commented Jul 21, 2016

@mikehearn as far as I understand, CHAMP is a data structure suitable for implementing hash maps and sets.

Do you know, is it possible to use CHAMP for ordered sets and maps (those that preserve element insertion order) and indexed lists?

@pniederw
Copy link

pniederw commented Jul 22, 2016

Last time I checked PCollections, it seemed to be considerably less state-of-the-art than Clojure's and Scala's persistent collections.

@ilya-g The author of capsule answered my questions on the scope of the project here: usethesource/capsule#2

At this time capsule feels more like a research prototype to me, and the project hasn't had much activity lately. However the author will deliver a talk at the upcoming JVMLS.

The most state-of-the-art persistent collections library implemented in Java and (supposedly) production-ready that I could find is https://github.com/GlenKPeterson/UncleJim. The starting point for this library was Clojure's persistent collections implementation. The author is also working on a new and improved persistent vector: https://github.com/GlenKPeterson/UncleJim/tree/2016-05-22_RRB-Tree
The author's own comparison to PCollections is here: https://github.com/GlenKPeterson/UncleJim/wiki/UncleJim-vs.-PCollections

@mikehearn
Copy link
Author

I don't know the answer to that, sorry.

@msteindorfer
Copy link

As the author of the CHAMP paper and the usethesource/capsule library, I want to jump in an answer some questions you're having.

@ilya-g, yes CHAMP can be used to implement hash-sets or hash-maps that preserve element insertion order. I implemented such a data structure last year for another project and now started to back port that data structure to usethesource/capsule. You can expect the implementation to appear in the codebase by September.

@ilya-g, @mikehearn, indeed, CHAMP is suitable for implementing hashed data structures. For implementing immutable indexed lists, the best candidate is currently the RRB Vector. Sorted data structures (e.g., that take a comparator) are probably still implemented best by a balanced binary tree (e.g., such as a red-black tree).

@pniederw et al., usethesource/capsule is more than a research prototype. It powers the builtin data structures of the http://www.rascal-mpl.org programming language. But were you're right is that it still needs polishing and work to make it more appealing to wider audience. From September on, I'll have more time for engineering than for research, therefore you can expected a lot of activity on the project.

Overall the aim of usethesource/capsule is to be a vehicle for implementing cutting edge research results on immutable data structures, and the current focus lies on hashed collections. I'm happy to discuss my roadmap with you and to see how common interests can be aligned.

@GlenKPeterson
Copy link

GlenKPeterson commented Aug 15, 2016

I really think the Clojure collections should be considered. My understanding is that Scala collections grew out of Rich Hickey's implementations of Okasaki's Purely Functional Data Structures and Bagwell/Rompf's papers. Hickey had many critical insights and brought Okasaki's ideas into the mainstream (at least for the JVM world) by adding some of his own and providing a highly performant, working implementation.

We've been using Paguro's type-safe versions of the Clojure collections in production for about 2 years now. When we find bugs, it's in the code that doesn't use these collections. I've been using Paguro in Kotlin as-is for about 3 or 4 months now. We could add some NotNull annotations, but that's about all I see necessary for ideal Kotlin use right now.

Even if you aren't interested in Paguro, I think the original Clojure collections deserve mention on the front page of this project. Without them, there might not be the Scala ones. And Clojure's transformation api is beautifully simple.

@norswap
Copy link

norswap commented Sep 21, 2016

I'm adding my vote for CHAMP.

I implemented a immutable map in Kotlin using the approach:
https://github.com/norswap/triemap

It's a learning project: It doesn't have the niceties that capsule nor compatibility with the Java collections API. But it should give you a sense of what's involved. Once you've grasped the basics, the implementation isn't terribly hard.

@danieldietrich
Copy link

danieldietrich commented Mar 20, 2017

Hi, Daniel chiming in :)

Disclaimer: I'm the creator of Javaslang

The ideas presented here are very interesting.

I'm not sure if the best way to provide interoperability with standard Java collections is to implement java.util.* collection interfaces. The mutator methods (i.e. the ones that return void) can only throw at runtime - which is unsafe. Javaslang's persistent collections for example will provide views for that purpose.

Javaslang was created to bring a big portion of the persistent/immutable part of Scala to Java (8). We benchmark our collections against Scala, Clojure, PCollection, Functional Java, Eclipse Collections and Java mutable collections. Here is how our Vector performs for example (> 1 := faster than).

We based our Hash* collections on Hash Array Mapped Tree (HAMT), the Sorted* collections on a Red/Back-Tree and the Vector on a Bit Mapped Trie (BMT). This is the collection hierarchy so far (upcoming 2.1.0):

javaslang-2-1-0-alpha

Here is a comparison of the APIs of two persistent Vector implementations.

Left: Javaslang / Right: PCollections

javaslang-vs-pcollections

@l0rinc
Copy link

l0rinc commented Mar 20, 2017

Here is how our Vector performs for example (> 1 := faster than).

To make it clear what I meant by PCollections being slower:

Operation   Ratio                                                 10      100     1026 
Create      slang_persistent/pcollections_persistent          29.70×  114.82×  243.27×
Head        slang_persistent/pcollections_persistent           1.75×    2.43×    2.81×
Tail        slang_persistent/pcollections_persistent           4.38×    6.00×    7.59×
Get         slang_persistent/pcollections_persistent           4.86×    6.08×    8.99×
Update      slang_persistent/pcollections_persistent           1.73×    2.34×    3.07×
Prepend     slang_persistent/pcollections_persistent           0.32×    0.74×    1.31×
Append      slang_persistent/pcollections_persistent           0.51×    0.74×    1.09×
Slice       slang_persistent/pcollections_persistent           0.53×    0.38×    0.40×
Iterate     slang_persistent/pcollections_persistent          13.31×   10.68×    8.02×

i.e. except for slice (which has a memory leak in Java, Clojure and PCollections, i.e. a wrapper over all elements), Javaslang is several times faster, consistently.

I think Kotlin is a very nice new language, we would be glad to incorporate techniques from CHAMP into Javaslang (where applicable), if that would help its Kotlin integration :)

@zvozin
Copy link

zvozin commented Apr 9, 2017

@danieldietrich I, for one, have been using Javaslang in Kotlin a while now as is and loving it. A wrapper that turns static utility methods like

Try<Seq<T>> sequence(Iterable<? extends Try<? extends T>> values)

into extensions is all that I would wish for, and would in fact volunteer to come up with such a package myself - I have half of it spread across my projects already anyway :).

@l0rinc
Copy link

l0rinc commented Apr 9, 2017

@zvozin, I think that would be a very valuable contribution! :)

@danieldietrich
Copy link

danieldietrich commented Apr 10, 2017

@zvozin That sounds really great! I will setup an empty Github repository javaslang-kotlin and invite you (with write privileges).

@Shusek
Copy link

Shusek commented Aug 23, 2018

Any update for it? It can be awesome feature if kotlin collection do not require copy all collection when 'add' operator is called in list.

@ilya-g
Copy link
Member

ilya-g commented Aug 27, 2018

@Shusek There's currently work in progress in the PR #20 with new implementations of persistent collections. The PersistentHashMap implementation is based on CHAMP, iirc.

Feel free to join the review if you're interested.

@ilya-g
Copy link
Member

ilya-g commented Jul 23, 2019

PersistentHashMap implementation in 0.2 uses CHAMP data structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests