Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency? #40

Open
raygit83 opened this issue Apr 2, 2024 · 5 comments
Open

Concurrency? #40

raygit83 opened this issue Apr 2, 2024 · 5 comments

Comments

@raygit83
Copy link

raygit83 commented Apr 2, 2024

So far I don't see any measures to handle concurrent accesses from different clients to the same db file. Is this deliberate? Do you plan adding locked r/w access?

@donjajo
Copy link
Owner

donjajo commented Apr 3, 2024

Hello @raygit83

Yes, there should be support for concurrency, but right now, I don't have a full idea of the implementation. My first thought is, we can't block the entire file for a r/w operation. I am thinking of implementing locking on each row level but I don't have a full thought on this yet. I will think about it further.

If you have an idea, please share.

@raygit83
Copy link
Author

raygit83 commented Apr 3, 2024

Hi @donjajo,

Thanks for your response. And yes, locking the file during r/w accesses isn't enough to ensure integrity, consider the following example:

  1. Two clients perform a read access to the db.
  2. Both clients perform an arbitrary edit on the same data.
  3. Both client commit their changes at different times.

Even with locked file access, one of the edits made in 2. will be lost because the changes of one of the clients won't be reflected on the other client's system depending on who gets to perform the write access last.

Here are some thoughts:

The very popular SleekDB solves this by creating a file for every single entry in the table and the maintaing a cache/index to allow for somewhat performant queries. One issue with that is that you might hit the max. amount of allowed files permitted on your hoster's filesystem rather quickly.

Another option would reading back the db, parsing and merging the data in a locked scope before every write access and read access would be locked against that. This would be safe, but it would quickly cause a performance hit depending on the size and complexity of your data.

Perhaps a mixture of both of these options, were the entries are stored in chunks of say up to 1024 entries per file that is locked in the discussed manner or something. So for a query one would have to iterate these files or they could be organized in some structured manner to speed things up (?).

Best,
Ray

@donjajo
Copy link
Owner

donjajo commented Apr 5, 2024

Thank you for your insights!

What about creating something like mutex? Blocking each session till one other has finished r/w operation? Do you think this would have performance issue? Theoritically, I can use named pipe for this. My only concern is if there is so many traffic, each session waiting for another. Of course, this will be very fast.

@raygit83
Copy link
Author

raygit83 commented Apr 6, 2024

Hi James,

Yes, a mutex will be required. It seems that PHP has a built-in mutex class these days, see here: https://www.php.net/manual/en/class.syncmutex.php , but I don't know how reliable it is. I have used mkdir() to implement a mutex in PHP in the past, because it's atomic.

Anyway, locking the entire session would be safe, but rather brutal imo, so I'd go for something more fine grained as discussed above. I'll try to sketch something next week.

Best,
Ray

@raygit83
Copy link
Author

raygit83 commented Apr 8, 2024

Thank you for your insights!

What about creating something like mutex? Blocking each session till one other has finished r/w operation? Do you think this would have performance issue? Theoritically, I can use named pipe for this. My only concern is if there is so many traffic, each session waiting for another. Of course, this will be very fast.

Hi James,

Here's an example implementation outlining my idea: https://github.com/raygit83/json_db.class/tree/main

The database is split into "pages" with e.g. 2048 entries at max. There's auto incremented id by which items can be accessed and which is used to determine the page in which an entry is stored. There's also a find() function which will have to iterate the pages and use PHP's array_filter() function to query specific items. One could use an index / cache mechanism to speed this up, but it should be possible to find a good balance in terms of performance by setting an appropriate value for ITEMSPERPAGE.

Locking is performed on the page level. This will allow for 100% solid concurrency.

Perhaps a similar backend can be worked into your class, which has a more sophisticated interface for accessing and querying records.

Best,
Ray

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants