-
Notifications
You must be signed in to change notification settings - Fork 69
Roadmap
Schedule for future Ocelot releases.
- see vanaheimr
- PTX instrumentation
- AMD Southern Islands support
- Top-level SCons build system, nightly regression tests.
- PTX Execution targets: CPU, NVIDIA, AMD, PTX Emulation
- CUDA Runtime API Implementation
- PTX Internal Representation and Parser
- Trace generators
- Compiler analysis framework
- PTX instrumentation tools
Ocelot is a research vehicle under continuous development, so these components are subject to bug fixes and enhancements. We strive to maintain the invariant that the head revisions of these components always compile and pass existing unit tests on the supported platforms. Contributors must avoid breaking these components and are responsible for modifying affected areas prior to committing.
- Harmony interface
- Vectorized CPU target
- Timing models and PTX simulation
- Ocelot debugging tools
- AMD Backend
These ongoing enhancements to Ocelot explore various research interests of the corresponding contributors.
Integration With Harmony
Build Ocelot into Harmony as the default method for launching a Harmony Kernel. Rip out some of the dynamic components from Harmony and use them to dynamically select a GPU/Emulated/LLVM CUDA device in the Ocelot implementation of the Cuda Runtime API.
- Harmony Should Execute Kernels Using Ocelot Devices * Create a new Harmony kernel class that contains only PTX source. * Use ocelot to dynamically translate the kernel to either a GPU, Emulated Kernel, or LLVM kernel.
- Runtime Device Selection * Create a Dynamic Cuda Device in Ocelot. * Use Harmony performance prediction to determine which underlying device to execute each kernel on.
- Kernel Dependency Resolution * Build basic pointer analysis support into Ocelot. * Map cudaMalloc memory allocations to Harmony variables * Track dependencies between cuda kernels * Execute them with the Harmony Runtime
- Kernel Online Optimization * Use Harmony to predict the performance improvement of specific optimizations * Dynamically recompile PTX kernels using the best optimizations