Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add condition variables to pico_sync (fixes #1093) #1101

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

pguyot
Copy link

@pguyot pguyot commented Nov 16, 2022

This PR provides condition variables as companion to mutexes.

It is implemented without any assumption on the number of cores.
Like mutexes, condition variables are protected by a spinlock. Because the spinlock could be the same as the one used by the associated mutex, the two cases are implemented to avoid a deadlock trying to acquire the two spinlocks.

When waiting on a condition variable, a core tries to be the waiter (=owner) and when it is, it waits to be signaled.
When signaling a condition variable, the caller verifies that there is a waiter, and if there is, sets a boolean to signal it.

This busy-loop implementation seems to be immune from spurious wakeup.

@alastairpatrick
Copy link
Contributor

I'm just a contributor so please take this with a "pinch of salt" and I suggest waiting for someone with authority before acting on any of this.

First of all, if this is to be called a condition variable module, I think it would be reasonable for an SDK user to expect it to also include a cond_broadcast() or similarly named function that signals all waiting threads. If an RTOS is in use, there could be >1 waiting threads.

I appreciate that implementing cond_broadcast() is a challenge in the context of the SDK, in part because there is nothing resembling a thread control block that could be used to efficiently implement a linked list of threads waiting for a particular condition variable. I have two suggestions.

My first suggestion is a (hopefully) upcoming thread local variable module for the SDK, which essentially provides a thread control block by another name.

My second suggestion is to add a broadcast_count variable to cond_t, which atomically increments whenever cond_broadcast() is called. I've never implemented condition variables this way and I wonder if it's flawed but I think it might give waiting threads enough information to all wake up on broadcast:

typedef struct __packed_aligned
{
    lock_core_t core;
    lock_owner_id_t priority_waiter;  // arbitrary one of N threads that will wake up on cond_signal().
    uint64_t broadcast_count;  // all N threads will wake up when this changes.
    bool signaled;
} cond_t;

Finally, I think there are some issues with condition variables as implemented.

There are three cases where cond_wait() blocks by calling lock_internal_spin_unlock_with_wait():

  1. A thread blocks until the cond_t is signalled.
  2. A thread blocks until the cond_t has no current waiter.
  3. A thread blocks until the mutex_t is released.

I use the term "blocked" here, and in the operating system sense, because that is exactly what will happen when an RTOS is in use: the calling thread will transition to a blocked state. In order to transition the thread out of the blocked state, another thread must call lock_internal_spin_unlock_with_notify().

Of the three, case 1 is covered by the call to lock_internal_spin_unlock_with_notify() in cond_signal() and I believe this case is fully covered. I think there are issues with cases 2 and 3 though.

In case 2, I there's no notification when cond->waiter becomes LOCK_INVALID_OWNER_ID so a thread waiting to become the current waiter might never wake up.

Case 3 is partially covered by the call to lock_internal_spin_unlock_with_notify() in mutex_exit(). However, code in cond_wait() also releases the mutex by bypassing the public mutex API and setting mtx->owner = LOCK_INVALID_OWNER_ID. So, like case 2, a thread waiting for the mutex to be released might never wake up.

@pguyot
Copy link
Author

pguyot commented Nov 17, 2022

Thank you.

  • I wasn't clear about the notify semantics and obviously got it wrong
  • I don't need broadcast for my use case, but thought of an implementation slightly different from your suggestion. I'll give it more thought, I like yours of using a counter that is unlikely to overflow. Maybe 32 bits would be sufficient.

@alastairpatrick
Copy link
Contributor

Perhaps it's slightly less efficient but I think cond_wait() is clearer refactored like this. I haven't attempted to fix any of the issues I mentioned above.

void __time_critical_func(cond_wait)(cond_t *cond, mutex_t *mtx) {
    lock_owner_id_t caller = lock_get_caller_owner_id();
    uint32_t save = save_and_disable_interrupts();
    spin_lock_unsafe_blocking(mtx->core.spin_lock);
    assert(lock_is_owner_id_valid(mtx->owner));

    if (mtx->core.spin_lock != cond->core.spin_lock) {
        spin_lock_unsafe_blocking(cond->core.spin_lock);
    }
    
    // Wait to be the waiter first, then wait for the signal.
    if (lock_is_owner_id_valid(cond->waiter)) {
        mtx->owner = LOCK_INVALID_OWNER_ID;
        spin_unlock_unsafe(mtx->core.spin_lock);
        do {
            if (!lock_is_owner_id_valid(cond->waiter)) {
                cond->waiter = caller;
                break;
            }
            lock_internal_spin_unlock_with_wait(&cond->core, save);
            save = spin_lock_blocking(cond->core.spin_lock);
        } while (true);
    } else {
        cond->waiter = caller;
        mtx->owner = LOCK_INVALID_OWNER_ID;
        spin_unlock_unsafe(mtx->core.spin_lock);
    }
    
    // We are the current waiter, now wait for the signal.
    do {
        if (cond->signaled) {
            cond->waiter = LOCK_INVALID_OWNER_ID;
            cond->signaled = false;
            break;
        }
        lock_internal_spin_unlock_with_wait(&cond->core, save);
        save = spin_lock_blocking(cond->core.spin_lock);
    } while (true);
      
    if (mtx->core.spin_lock != cond->core.spin_lock) {
        spin_unlock_unsafe(cond->core.spin_lock);
    }
    
    do {
        if (!lock_is_owner_id_valid(mtx->owner)) {
            mtx->owner = caller;
            spin_unlock(mtx->core.spin_lock, save);
            break;
        }
        lock_internal_spin_unlock_with_wait(&mtx->core, save);
        save = spin_lock_blocking(mtx->core.spin_lock);
    } while (true);
}

@kilograham kilograham added this to the 1.6.0 milestone Feb 7, 2023
@pguyot pguyot force-pushed the w46/condition-variables branch 2 times, most recently from 58fa5fb to 26d7860 Compare April 9, 2023 15:18
@pguyot
Copy link
Author

pguyot commented Apr 9, 2023

@alastairpatrick Thank you for your feedback. I eventually implemented broadcast as well as timed waits.

pguyot added a commit to pguyot/AtomVM that referenced this pull request Apr 9, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
@kilograham
Copy link
Contributor

Thanks for the update; I haven't really had a chance to look at this in detail; I do note however that the latest commit uses "_sev()" in some path which is not valid under a RTOS (it is what lock_internal_spin_unlock_with_notify is there for). So hopefully things can be refactored to use that method in some way - it seems like the @alastairpatrick version might do that... sorry busy with other stuff atm, so not really engaging brain fully!

@pguyot
Copy link
Author

pguyot commented Apr 9, 2023

@kilograham Thank you for this feedback.

The __sev() call was here to solve an issue raised by @alastairpatrick which wasn't solved by his refactorization proposal, as he wrote. I rewrote the code to use lock_internal_spin_unlock_with_notify instead of spin_unlock_unsafe on the paths where __sev() was called.

pguyot added a commit to pguyot/AtomVM that referenced this pull request Apr 10, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
pguyot added a commit to pguyot/AtomVM that referenced this pull request Apr 10, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
pguyot added a commit to pguyot/AtomVM that referenced this pull request Apr 10, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
@pguyot pguyot force-pushed the w46/condition-variables branch from 23c4cb5 to 653b354 Compare June 4, 2023 17:09
pguyot added a commit to pguyot/AtomVM that referenced this pull request Jun 4, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
pguyot added a commit to pguyot/AtomVM that referenced this pull request Jun 4, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
pguyot added a commit to pguyot/AtomVM that referenced this pull request Jun 5, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
@kilograham kilograham self-assigned this May 19, 2024
@kilograham kilograham modified the milestones: 1.6.1, 1.7.0 May 19, 2024
@pguyot pguyot force-pushed the w46/condition-variables branch from 653b354 to f593b3a Compare September 21, 2024 06:06
@pguyot
Copy link
Author

pguyot commented Sep 21, 2024

@kilograham this was rebased on top of SDK 2.0 and tested on RP2040, RP2350 ARM and RP2350 RISCV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants