Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The future of Hive #246

Closed
simc opened this issue Feb 28, 2020 · 148 comments
Closed

The future of Hive #246

simc opened this issue Feb 28, 2020 · 148 comments

Comments

@simc
Copy link
Member

simc commented Feb 28, 2020

TLDR: Hive 2.0 will be rewritten in Rust to get amazing performance, multithreaded queries, read and write transactions, and super low memory usage. The code will work 1:1 in the browser.

Situation

I have thought a long time how to correctly implement queries in Hive and I have come to the conclusion that it is not possible with the current architecture.
I have reviewed many projects on GitHub which use Hive and most of them have to create their own suboptimal workaround for queries.
Apart from queries, Hive has another problem: Dart objects use much RAM. Since Hive currently keeps at least the keys in RAM, you can hit the OS limit of mobile devices quite fast.

I also created polls some time ago on the Hive documentation page and there were two very strong takeaways:

  1. Queries are something almost every user wants
  2. An overwhelming majority (86%) of users don't mind breaking changes

Idea

So here is what I have come up with:
I will completely rewrite Hive in Rust. I will use the strengths of the old implementation (like auto migration) and fix the issues.
On the VM, Hive will use LMDB as backend and on the Browser IndexedDB. The VM implementation will provide the same features as IndexedDB to allow easy code sharing.
The two main goals of Hive will stay the same: Simplicity and Performance.

I have a small prototype and the performance is amazing. LMDB has to be some kind of black magic.

Sample

Here is how it is going to work:

The model definition is very similar to current models:

@HiveType(typeId: 0)
class Person {
  @Primary
  int id;

  @HiveField(fieldId: 0)
  @Index(unique: false)
  String name;

  @HiveField(fieldId: 1)
  int age;
}

Hive will then generate extension methods so you can write the following query:

var box = Hive.openBox<Person>('persons');
box
  .query()
  .where()
  .nameEquals('David')
  .or()
  .nameStartsWith('Lu')
  .filter()
  .ageBetween(18, 50)
  .sortedByAge()

where() vs filter()

The difference between where() and filter() is that where() uses an index and filter() needs to test each of the resulting elements. Normally a database figures out when to use an index itself but Hive will provide you the option to customize.

There are multiple reasons for that:

  1. This code will work 1:1 with IndexedDB
  2. You know your data best and can choose the perfect index
  3. The database code will be significantly easier

Things to figure out

  • How can auto-updating queries be implemented efficiently?
  • What are the restrictions for shipping binaries on iOS?
  • Should there still be a key-value store?

Blocking Issues (pls upvote)

Other issues

For existing apps using Hive 1.x:

I will continue to keep a Hive 1.x branch for bugfixes etc.

What do you think?

@simc simc pinned this issue Feb 28, 2020
@simc simc mentioned this issue Feb 28, 2020
14 tasks
@kaboc
Copy link

kaboc commented Mar 2, 2020

Rewriting in Rust sounds so interesting. Better performance and less memory usage are attractive and welcome.

However, I'm seriously worried about compatibility of Box. Can boxes for v1.x still be used in v2 as well? It's dubious as to what other users actually meant in the polls. For me, only breaking changes of APIs are acceptable, not of Box.

I have to decide whether to leave Hive and choose another package for the app I'm currently developing if there is a risk that I need to go through all the trouble to port old boxes to new ones on my own sometime in the future.

Having said that, your plan is exciting too all the same. I look forward to seeing the first release of the new major version.

@simc
Copy link
Member Author

simc commented Mar 2, 2020

Yes I agree. It is bad that old boxes will not be compatible and existing apps in production cannot upgrade to the new version without loosing their data.

Hive is very young and I still think it is the right path. For future breaking changes there will be auto migration. Unfortunately that is not possible for this change because we switch the backend.

@shinayser
Copy link

shinayser commented Mar 2, 2020

A noob question: how will you make Rust to work with Dart?

@simc
Copy link
Member Author

simc commented Mar 2, 2020

Using Dart FFI

@shinayser
Copy link

shinayser commented Mar 2, 2020

Using Dart FFI

But DART FFI is only for C language, not Rust, right?

It will require the user to use FFI or are you planning to provide a working interface already in Dart?

@simc
Copy link
Member Author

simc commented Mar 2, 2020

But DART FFI is only for C language, not Rust, right?

Rust does provide C interop...

It will require the user to use FFI or are you planning to provide a working interface already in Dart?

The user will only use Dart and does not even notice the Rust backend.

@Mravuri96
Copy link

You should give https://vlang.io/ a shot 😜

@simc
Copy link
Member Author

simc commented Mar 3, 2020

You should give https://vlang.io/ a shot

V is interesting but in my opinion, there are multiple reasons why it is not a good idea to use V at the moment:

  • V still very unstable
  • V has a very small community and not many packages
  • V is a very "basic" language. While the creator argues that this is a design choice, I expect languages to provide at least a Set data structure.
  • I think Rust is superior to V in almost any aspect. There are many zero-cost abstractions like iterators

@MarcelGarus
Copy link
Contributor

What initially excited me about Hive is that it's a pure Dart library without external dependencies, so it runs everywhere.
Obviously, the same is true if the backend would be implemented in Rust, but I begin to wonder: There are loads of existing database implementations in Rust that are far more advanced. There are of course the usual SQLite-ish standards, but also document-based databases like MongoDB and truly innovative approaches like this one.
I'm afraid there's nothing about Hive that's fundamentally better than with other database solutions, so rather than reimplementing the wheel, why not use some existing database and built a nice Dart-wrapper around it? Because developers also use the Rust database on its own, there are more users, more contributors and all developers from both the Rust and the Flutter community benefit from the research, optimizations, and bug fixes that are implemented on the Rust side.
This package could simply focus on providing the most intuitive Dart API possible, which would make maintaining the package easier as well.

@simc
Copy link
Member Author

simc commented Mar 12, 2020

What initially excited me about Hive is that it's a pure Dart library without external dependencies

Yes, that was the goal but it turned out that most users don't want a database that is basically an in-memory KV-store. The problem with Dart is that it is kind of slow, its objects are rather memory hungry and it misses essential features to implement a more advanced database.

There are loads of existing database implementations in Rust that are far more advanced

I thought the same thing but the list of candidates is short. In fact, I didn't find a single database that is suitable for mobile devices and our requirements.

Also, to my knowledge, there is no database that is built as a counterpart to IndexedDB. It is not trivial to write a database that works exactly the same in the browser. IndexedDB is very different from most other databases.

I'm afraid there's nothing about Hive that's fundamentally better than with other database solutions

As I said, I don't think there exists a single cross-platform database that also works in the browser and I don't think existing databases can be easily used with Dart and still have great performance. Realm, for example, will never work with Dart because it relies on proxy objects.

So I'm writing basically writing an abstraction around IndexedDB and LMDB in Rust which can be compiled to a binary or WASM.

And then there will be the Dart wrapper around this "backend".

It should be easily possible to use only the Rust side for example with React native.

Edit: I already have a fully working prototype of the LMDB part of the wrapper and not much Rust code is required. The performance is exceptional.

If you have an alternative approach that allows us to have a "real" database which also works in the browser, I'd love to discuss it.

Edit2: Another advantage of this approach is that breaking changes of the binary format will no longer be required and bugs that corrupt the database will not happen anymore because the storing of the data will be handled by LMDB and IndexedDB respectively.

Edit3: Like most other databases, Noria, the one you linked, is for backends and thus not really suitable for mobile devices.

@MarcelGarus
Copy link
Contributor

Okay, I see. I was really expecting more lightweight Rust databases to exist.
Then I take back what I said before — the Rust-Dart-architecture seems to be a great fit 😊
I'm looking forward to using this

@simc
Copy link
Member Author

simc commented Mar 12, 2020

There is one topic where I still need input: Since we are rebuilding Hive anyway, I'd like to make it ready for synchronization from the beginning.

What do you guys think about CouchDB as a backend?
Do you know good articles or papers on sync without conflicts? I need an easy to use (for the user) mechanism to avoid or resolve conflicts.

@ashim-kr-saha
Copy link

Syncing with CouchDB, is exactly what I am looking for my next project.

PouchDB, implementation in dart will be great solution.

@frank06
Copy link

frank06 commented Mar 12, 2020

Used CouchDB a long time ago, and while the conflict resolution mechanism was neat, queries were a pain. That might have changed, or not. I thought one of the major drivers of this Hive rewrite was query support.

@simc
Copy link
Member Author

simc commented Mar 12, 2020

I thought one of the major drivers of this Hive rewrite was query support.

Yes, the queries you use with should be more or less independent of CouchDB.

CouchDB is just an idea and nothing I have decided yet. I just want to figure this out before the first stable release of the new version has been released.

It would be great if someone knows a backend which fits our use case.

@jamesdixon
Copy link

Unsolicited advice / thought:

I realize you're planning for the future ahead of time by factoring in sync support, but it's a complex
topic and something I personally would leave till after the rewrite 😄

I also say this because I'm not using CouchDB and while I'd love conflict resolution, I haven't found any really easy way to sync CouchDB changes to Postgres. This seems to be a problem for many who may use another database. Limited research, but I just wanted to throw it out there given that my anecdotal evidence suggests that more people are using Postgres/MySQL/etc and if CouchDB support for those is weak, it may not be worth the additional effort upfront.

All that said, killer job with Hive thus far. Excited to see what's next.

@simc
Copy link
Member Author

simc commented Mar 12, 2020

Thanks for your opinion. Yes it would be cool to have a conflict resolution which is independent of the backend database but I have to do a lot of research because I have no idea how to do it 😆

@jonataslaw
Copy link

I think it would be prudent to create a second project> Hive ffi <or something like that. I have 9 applications in production using Hive, and it makes me very afraid to think that users with 300mb/500mb of data on Hive, may lose everything after a library update.
I feel very enthusiastic to test Hive with Rust, it must be incredible, however, in my opinion, changing the backend is not legal for a stable library, if it were pre-release it would be justified, but there are many people who use Hive as a KV storage for many scripts than SP does not do so well, and queries are legal, but even cooler than that would be to maintain compatibility.
I'm following the thread because if Hive changes, maybe I will need to fork this project, but if there is a risk-free way of migration, I fully support the idea.

@simc
Copy link
Member Author

simc commented Mar 14, 2020

I don't think it will be possible to automatically migrate the data because the two models are not entirely compatible. But I will maintain a branch that contains the current version so you can just continue to use it.

@algodave
Copy link

@leisim In your vision, will the new Hive still allow to Create adapter manually? I'm not using code generation In my project, I'm just defining my own class MyModelHiveAdapter extends TypeAdapter<MyModel>

@pishguy
Copy link

pishguy commented Mar 21, 2020

when this version can be release and we can use that? 💃 💃 💃

@simc
Copy link
Member Author

simc commented Mar 21, 2020

In your vision, will the new Hive still allow to Create adapter manually?

@algodave I don't think it will be possible in the same way as it is currently because in order to query your data, Hive needs to understand its structure. Probably there will still be adapters that map objects to Map<int, dynamic>. The keys of this map will be the field ids and the values are the primitive values of the fields (int, double, bool, String, List<int>, List<double>, List<bool>, List<String>). You can customize these adapters.

when this version can be release and we can use that?

@MahdiPishguy It probably still takes another month until I have the first test version.

@xylobol
Copy link

xylobol commented Apr 9, 2020

@jonataslaw

I'm following the thread because if Hive changes, maybe I will need to fork this project, but if there is a risk-free way of migration, I fully support the idea.

I've been working on a mission-critical project with Hive, and a major pull was that it's completely written in Dart, so I may need to fork as well. If you're interested, I can keep you posted.

@stefanrusek
Copy link

Might I request you create a new library? A complete rewrite, with different behavior, and large api changes is not a new version but a new library. Going down this path means there will be numerous forks of Hive 1, and people would just be better served if you started a new project and let Hive continue to evolve.

@algodave
Copy link

@xylobol @jonataslaw I am one of those who would be interested in a fork

@listepo
Copy link

listepo commented Apr 19, 2020

@leisim any news?

@stevenspiel
Copy link

@leisim I'm also interested in the progress on this.

@simc
Copy link
Member Author

simc commented Feb 14, 2021

@yringler Yes I'm quite happy with the progress. According to my tests and benchmarks, Isar will probably solve all problems people have with Hive currently.

@dgandhi17
Copy link

@leisim Can we expect stable version by this month?

@simc
Copy link
Member Author

simc commented Feb 15, 2021

@dgandhi17 stable probably not because that implies that it has been battle tested. You can expect a beta version within the next few weeks.

@michalisioak
Copy link

michalisioak commented Feb 20, 2021

A noob opinion:
may sound silly but what if we make hivedb a system service and though IPC (interprocess communication) talk to dart clients (and may other languages)

Advantages ( I thought)

  1. less storage taken
  2. later support for having the same database stored on all user devices
  3. modularization

Disadvantages ( I also thought)

  1. more complicated (client must be different from the service)
  2. dont know how to implement this
  3. security, (in my head, every application would have a key initialized hardcoded in the app, something like containerization but for db)

Feel free to correct me and point if this is possible

@ryanheise
Copy link

Apart from queries, Hive has another problem: Dart objects use much RAM. Since Hive currently keeps at least the keys in RAM, you can hit the OS limit of mobile devices quite fast.

Casual observer here, but I wonder whether Dart compiler improvements are having/will have any impact.

The new compiler in SDK 2.12 contains memory and performance optimisations when code is compiled with sound null safety.

Some other enhancements on the roadmap:

https://github.com/dart-lang/sdk/projects/23#card-55259467

@Hassico
Copy link

Hassico commented Feb 28, 2021

Humble Hive user here. Is it possible to make data posts or hive files have max age, stale ... options like we find in caching properties?

@erf
Copy link

erf commented Mar 15, 2021

I only need a db solution for my dart server project, have you thought about making a pure dart wrapper around LMDB as a side project to avoid bloat by including indexdb etc.

I'm using Redis now, but i suspect LMDB would be faster.

@erf
Copy link

erf commented Mar 15, 2021

Maybe some benchmarks for isar would be interesting. Comparing with some other popular db solutions, including Hive, Redis, sembast, sqflite etc.

@simc
Copy link
Member Author

simc commented Mar 16, 2021

@erf Dart is very good at tree shaking unused code so the indexeddb wrapper will never be included in your mobile app build.

Redis is an in-memory database so it's likely faster than anything else. I'm not sure benchmarks are very useful here because the databases are completely different. That being said I expect Isar to be faster than all other databases currently available for Flutter.

Currently I have more important tasks (unit tests 🙄) than writing a benchmark but maybe someone from the community will write one.

@wesleytoshio
Copy link

Hello, is it already possible to use custom queries? or sort by date as you said at the beginning of the post?

@fzyzcjy
Copy link

fzyzcjy commented Oct 6, 2021

Hi friends! As is suggested here, maybe this package is helpful: https://github.com/fzyzcjy/flutter_rust_bridge

@bsutton
Copy link

bsutton commented Dec 9, 2021

I have to say that if this makes hive harder to deploy then I would be against it.
A multi-language project will also make it harder to recruit contributors.

Hive performance is OK but its missing decent indexing and of course memory usage is a problem.

These could be solved by hive implement its own disk based btree indexes in dart.
This is would fix most of the memory problems (no longer need to load all keys/objects into memory).
This could be done as new type of box - IndexedBox.
Existing users can still use an in memory Box, a LazyBox or an IndexedBox.

Use meta data to define the indexes and as you have suggested the 'where' methods to access the indexes.
It would be nice if the filter command could automatically use indexes but that would be a lot tricker. For an IndexBox ideally the filter would page keys/objects into memory rather than loading the whole set.

I found this implementation of a btree in java which would port easily across to dart.

https://github.com/myui/xbird/blob/master/xbird-open/main/src/java/xbird/storage/index/BIndexFile.java

I would suggest that this would also be much less work than the proposed rust path.

If I had to do additional work to ship with hive (i.e. deploying a binary) then I wouldn't have chosen hive.

FWIW: I build and deploy cli tooling that needs to work across linux/macos/windows.

@simc
Copy link
Member Author

simc commented Dec 10, 2021

@bsutton Tge additional work to deploy the Isar binaries is to add:

dependencies:
  isar_flutter_libs: any

Dart misses mmap and threads and is therefore not a good language for a database. Please read the first post for a more detailed explanation.

@themisir
Copy link
Contributor

Dart misses mmap and threads and is therefore not a good language for a database. Please read the first post for a more detailed explanation.

Just want to add that, from what I see is, there's no interest in implementing proper threading support to dart, the team rather prefers to improve their Isolate implementation which does limits what could be done with threads, lacks shared memory and does involve serialization to move static typed data between isolates

@bsutton
Copy link

bsutton commented Dec 10, 2021

@leisim I'm not actually doing flutter Dev but rather server side.

Does the flutter lib work with for non flutter apps?

Are all the noted platforms supported?

@simc
Copy link
Member Author

simc commented Dec 11, 2021

@bsutton I don't recommend using any embedded database on the server. You're almost always better off using a dedicated database.

@bsutton
Copy link

bsutton commented Dec 12, 2021 via email

@om-ha
Copy link

om-ha commented Dec 18, 2021

@bsutton Tge additional work to deploy the Isar binaries is to add:

dependencies:
  isar_flutter_libs: any

Dart misses mmap and threads and is therefore not a good language for a database. Please read the first post for a more detailed explanation.

Your work is hugely appreciated Simon.

I think Isar complements Hive. Hive could be used instead of shared preferences for simple stuff and Isar could be used for large queryable data.

EDIT: One might ask, why not use shared_preferences instead of hive? Well my obviously non-affiliated answer to that is:

  • performance benchmarks
  • multiple separate boxes (for separation of concern)
  • boxes run on a separate isolate (internally, IsolatedBox)
  • encryption
  • etc...

@simc
Copy link
Member Author

simc commented Dec 18, 2021

I think Isar complements Hive

Fully agree! It's not intended to be a replacement.

@simc
Copy link
Member Author

simc commented Jan 1, 2022

Isar is stable now. Thanks everyone for your help and participation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests