Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to share memory accross threads ? #332

Open
camille-chelpi opened this issue Dec 2, 2024 · 7 comments
Open

How to share memory accross threads ? #332

camille-chelpi opened this issue Dec 2, 2024 · 7 comments

Comments

@camille-chelpi
Copy link

camille-chelpi commented Dec 2, 2024

How to share an object/array across threads without coping it accross a channel/futur in order to minimise memory usage ?
With pthreads (php 7.2), we could use Threaded but impossible to do it with Paralell ?

<?php

$big_storage_elt = array();

$threads = array();
for($i=0;$i<5;$i++)
{
    $thread = new \Parallel\Runtime();
    $threads[] = $thread->run(function($index,$shared_array){

        $shared_array[$index] = 'value '.$index;

        return true;

    },array($i,$big_storage_elt));
}

//waiting end of each thread
foreach($threads as $thread) $thread->value();

print_r($big_storage_elt);

I would like to see as result:

Array
(
    [0] => value 0
    [1] => value 1
    [2] => value 2
    [3] => value 3
    [4] => value 4
)
@camille-chelpi camille-chelpi changed the title How to share accross threads ? How to share memory accross threads ? Dec 2, 2024
@realFlowControl
Copy link
Collaborator

Hey @camille-chelpi 👋

you are correct, this won't work anymore. You can read about the main change in philosophy from ext-pthreads to ext-parallel at https://www.php.net/manual/de/philosophy.parallel.php

One of the reasons is also in your code. When accessing $shared_array[$index] in the threads, you need some way to synchronise multiple threads accessing the shared memory, otherwise you would get at least a race-condition (as long as operations are atomic) or undefined behaviour. Generally we'd try to avoid both 😉

Hope this helps.

@camille-chelpi
Copy link
Author

camille-chelpi commented Dec 2, 2024

I have this solution but it's not nice at all because I'm can't manage the concurence beetwen the unserialize() & serialize() inside a thread and it also mutliply the memory used by the number of thread

<?php

$big_storage_elt = new \Parallel\Sync(serialize(array()));

$threads = array();
for($i=0;$i<5;$i++)
{
    $thread = new \Parallel\Runtime();
    $threads[] = $thread->run(function($index,$shared_array){

        $datas = unserialize($shared_array->get());
        $datas[$index] = 'value '.$index;
        $shared_array->set(serialize($datas));

        return true;

    },array($i,$big_storage_elt));
}
foreach($threads as $thread) $thread->value();

print_r(unserialize($big_storage_elt->get()));

@realFlowControl
Copy link
Collaborator

IDK the problem you are solving, but this works:

<?php

$big_storage_elt = [];

$threads = array();
for($i=0;$i<5;$i++)
{
    $thread = new \Parallel\Runtime();
    $threads[] = $thread->run(function($index){
        return [$index, 'value '.$index];

    },[$i]);
}

//waiting end of each thread
foreach($threads as $thread) {
    list($k, $v) = $thread->value();
    $big_storage_elt[$k] = $v;
}

print_r($big_storage_elt);

@realFlowControl
Copy link
Collaborator

Perhaps you would also like to take a look at the code in this repo: https://github.com/realFlowControl/1brc

@camille-chelpi
Copy link
Author

IDK the problem you are solving, but this works:

<?php

$big_storage_elt = [];

$threads = array();
for($i=0;$i<5;$i++)
{
    $thread = new \Parallel\Runtime();
    $threads[] = $thread->run(function($index){
        return [$index, 'value '.$index];

    },[$i]);
}

//waiting end of each thread
foreach($threads as $thread) {
    list($k, $v) = $thread->value();
    $big_storage_elt[$k] = $v;
}

print_r($big_storage_elt);

Yes but it's cost much more memory and delay using futur & channel when you deal with huge array.

@camille-chelpi
Copy link
Author

camille-chelpi commented Dec 2, 2024

It should be perfect if \Parallel\sync can have a function like this that could handle array as value also.
public function set($key,$value);
public function get($key);

something to study I guess

@realFlowControl
Copy link
Collaborator

realFlowControl commented Dec 2, 2024

I get your point and I'd like to learn more about the issues. Like how big will the $big_storage_elt array grow (in size in bytes / number of entries / ...). I do get, that the data each thread returns in the 1brc code is not too much, even as the input file is 13 GB in size. Have you run a profiler to see how much the impact of copying the memory from one thread to another and combining intermediate results is? Like do you hit memory limits of your machine/container?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants