Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Multiple Culvert Instances From Running #48

Open
iwoloschin opened this issue Dec 14, 2023 · 2 comments
Open

Prevent Multiple Culvert Instances From Running #48

iwoloschin opened this issue Dec 14, 2023 · 2 comments

Comments

@iwoloschin
Copy link
Contributor

After a very silly mixup of which host I was on I accidentally ran two instances of culvert in parallel. One was flashing the firmware and the other triggered a WDT reset. This left my system in a pretty broken state! Fortunately this was in my lab so it was easily rectified, but it made me wonder if culvert could do something to prevent multiple instances from running in parallel?

I assume the right way to do this is with flock? I'd probably just wrap the entire program (somewhere in https://github.com/amboar/culvert/blob/main/src/culvert.c?) instead of specific subcommands, that would prevent my scenario from occurring again.

I'm open to attempting this myself, but wanted to see if anyone else had opinions or thinks this is a terrible idea before I look at writing any code!

@amboar
Copy link
Owner

amboar commented Dec 14, 2023

Okay, some immediate thoughts are there are at least three layers at which we should be locking. We need locking over:

  1. Bridge operations
  2. The fingerprinting operation,
  3. Use of the SoC.

For the latter on the AST2400 and AST2500 we can make use of SCU40[1], the "BMC Firmware Protection" bit ("Forbid SOCFlash to update flash (support from SOCFLASH v1.02.01"). On the AST2600 this moves to SCU100[1]. The differing locations require we perform the SoC locking after we've fingerprinted the address space and loaded the appropriate devicetree. This drives the need to lock while acquiring coherent fingerprint data, as that requires multiple transactions via the chosen bridge. But then we need to lock while driving the bridges, as they're often indirect and require multiple register accesses to complete a transaction, which will conflict in that case of multiple culvert processes.

I don't want to do a big lock up front around all of culvert as that requires unpicking all the assumptions later when we decide we need to get rid of it. We've seen this mistake before with the Big Kernel Lock, so let's dodge that trap from the start. Perhaps you could argue that the BMC Firmware Protection bit falls into this bucket too, but we don't have a more fine-grained coherent way of managing mutual exclusion without resorting to modifying memory (perhaps we could abuse SRAM like the trace command?).

For the bridge lock we should use flock(2) with LOCK_EX on something like /run/lock/culvert.${bridgename}. For the fingerprint lock we can use something like /run/lock/culvert.fingerprint (again with flock(2) and LOCK_EX).

@amboar
Copy link
Owner

amboar commented Dec 15, 2023

actually, we could probably get away without the fingerprint lock, that's not actually achieving much as the operations are all reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants