Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Support building single-threaded OpenVDB #1973

Open
hiokazaki opened this issue Nov 27, 2024 · 2 comments
Open

[REQUEST] Support building single-threaded OpenVDB #1973

hiokazaki opened this issue Nov 27, 2024 · 2 comments

Comments

@hiokazaki
Copy link

hiokazaki commented Nov 27, 2024

These days, systems with 100+ hardware threads are increasingly common, but TBB performance on such systems is atrocious; it is particularly bad on non-x86 systems with more advanced memory models that have "light weight" synchronization primitives (that TBB does not use.)

As an example, I just moved a code to a 350-hardware-thread system and performance dropped by 300x compared to my 8-core desktop, because TBB's concurrent hash map is... not the best.

Admittedly, I am creating and destroying lots of ValueAccessors, but it is sad that the reason this is bad is because 99.9% of the time (yes, this is a correctly-rounded measurement 🤣 ) is spent in TBB's concurrent hash map.

Another issue I run into (this one costing another 2x factor of runtime on large systems) is due to the use of std::atomic, mutexes, and so on, which are not free, even when uncontended.

Describe the solution you'd like

It would be ideal if OpenVDB could have a "single threaded" configuration option, so that all uses of atomic ops, mutexes, TBB, etc., are removed - this leaves a clean, faster (potentially much faster) code path for users to manage whatever concurrency they might have on their own.

Failing that, it would be great if the use of TBB could be made optional at configure time - this avoids the dependency and the catastrophically poor performance of TBB on large, non-x86 systems.

Describe alternatives you've considered

Completely removing TBB might be simpler than maintaining it - more loosely-coupled, task-oriented concurrency is a much saner approach for current and future systems, instead of the current fine-grained, broadcast-heavy approach taken in e.g. {Const}AccessorRegistry

@hiokazaki hiokazaki changed the title [REQUEST] Support building OpenVDB without TBB [REQUEST] Support building single-threaded OpenVDB Nov 28, 2024
@danrbailey
Copy link
Contributor

This is definitely something we have considered in the past. I don't think we would be looking to completely strip out TBB, that seems impractical, but I agree with you that it should at least be possible to build with a different threading implementation or to disable it altogether and run single-threaded. We have made a start (of sorts) by consolidating some TBB-related functionality into a new threading header (https://github.com/AcademySoftwareFoundation/openvdb/blob/master/openvdb/openvdb/thread/Threading.h). The first real attempt would likely be to wrap TBB in our own OpenVDB primitives, but even that is a substantial amount of work.

In the short term, something you can try if you like is to use the "unsafe" ValueAccessor that does not register itself with the tree, so doesn't go through the mutex at all:

https://github.com/AcademySoftwareFoundation/openvdb/blob/master/openvdb/openvdb/tree/Tree.h#L681

We also don't make a lot of usage of atomics and other synchronization primitives. Probably the most notable usage is delayed-loading, which is a feature that can be disabled:

https://github.com/AcademySoftwareFoundation/openvdb/blob/master/CMakeLists.txt#L83

@hiokazaki
Copy link
Author

Thank you for these really interesting comments!

Paving the way towards optional, configurable threading seems awesome, I was unaware steps were already being taken in that direction.

I was also unaware of getUnsafeAccessor() - which is brilliant! That resolved the immediate performance bugs I was running into. Since standard C++ containers have the same kind of safety guarantee: multiple concurrent readers are fine, so long as nobody is writing - far from being unsafe, it feels like these are normal accessors and the registry is an optional extra. I suspect that larger applications using OpenVDB as a data structure are going to have their own threading considerations, and so if they're already doing some work to manage concurrency, there's not a lot of extra value in OpenVDB offering its own. (But I understand that other applications making use of OpenVDB's tools on larger datasets could save a lot of time with those tools running in parallel...)

We also don't make a lot of usage of atomics and other synchronization primitives. Probably the most notable usage is delayed-loading, which is a feature that can be disabled:

https://github.com/AcademySoftwareFoundation/openvdb/blob/master/CMakeLists.txt#L83

I see. I had in mind the accessor registry - which, via the TBB concurrent hash map - ends up choking on synchronization.

As a side note, I have found that folly's ConcurrentHashMap (https://github.com/facebook/folly/blob/main/folly/concurrency/ConcurrentHashMap.h) doesn't have TBB's performance bugs, and happily scales to hundreds of threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants