You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Either by design or coincidence, there are certain blocks between modules that are meant to be equivalent. Most intentionally, many PothosGPU blocks are specifically designed to be drop-in substitutes for PothosBlocks or PothosComms blocks. There are some cases where, somehow for some ArrayFire functions with very subpar optimizations, the new SIMD implementation somehow ends up being faster, especially if ArrayFire only has the fallback CPU implementation. When gr-pothos blocks come into the picture, who knows? The block_executor overhead isn't as bad as I thought.
My initial thought is, for a given desired functionality, a JSON config with the block registry paths and parameters needed for each block and a tool to run each block one-by-one into a Probe Rate block to determine the most efficient. Once that has been determined, a block is auto-generated (the conf loader would likely come in somewhere) that will automatically use this fastest. This would transparently result in a single block that, per-DType, uses the fastest registered implementation.
The text was updated successfully, but these errors were encountered:
ncorgan
changed the title
Profiler for equivalent blocks, automatically use fastest
Feature: profiler for equivalent blocks, automatically use fastest
Mar 20, 2021
Either by design or coincidence, there are certain blocks between modules that are meant to be equivalent. Most intentionally, many PothosGPU blocks are specifically designed to be drop-in substitutes for PothosBlocks or PothosComms blocks. There are some cases where, somehow for some ArrayFire functions with very subpar optimizations, the new SIMD implementation somehow ends up being faster, especially if ArrayFire only has the fallback CPU implementation. When gr-pothos blocks come into the picture, who knows? The block_executor overhead isn't as bad as I thought.
My initial thought is, for a given desired functionality, a JSON config with the block registry paths and parameters needed for each block and a tool to run each block one-by-one into a Probe Rate block to determine the most efficient. Once that has been determined, a block is auto-generated (the conf loader would likely come in somewhere) that will automatically use this fastest. This would transparently result in a single block that, per-DType, uses the fastest registered implementation.
The text was updated successfully, but these errors were encountered: