Skip to content
Jin Wang edited this page Mar 27, 2015 · 1 revision

Roadmap

Schedule for future Ocelot releases.

Planned for Version 4.x

Planned for Version 3.x

New Features

  • PTX instrumentation
  • AMD Southern Islands support
  • Top-level SCons build system, nightly regression tests.

Ocelot Core Components

  • PTX Execution targets: CPU, NVIDIA, AMD, PTX Emulation
  • CUDA Runtime API Implementation
  • PTX Internal Representation and Parser
  • Trace generators
  • Compiler analysis framework
  • PTX instrumentation tools

Ocelot is a research vehicle under continuous development, so these components are subject to bug fixes and enhancements. We strive to maintain the invariant that the head revisions of these components always compile and pass existing unit tests on the supported platforms. Contributors must avoid breaking these components and are responsible for modifying affected areas prior to committing.

Current Projects

  • Harmony interface
  • Vectorized CPU target
  • Timing models and PTX simulation
  • Ocelot debugging tools
  • AMD Backend

These ongoing enhancements to Ocelot explore various research interests of the corresponding contributors.

Ongoing Development

Integration With Harmony

Harmony Figure

Build Ocelot into Harmony as the default method for launching a Harmony Kernel. Rip out some of the dynamic components from Harmony and use them to dynamically select a GPU/Emulated/LLVM CUDA device in the Ocelot implementation of the Cuda Runtime API.

  1. Harmony Should Execute Kernels Using Ocelot Devices * Create a new Harmony kernel class that contains only PTX source. * Use ocelot to dynamically translate the kernel to either a GPU, Emulated Kernel, or LLVM kernel.
  2. Runtime Device Selection * Create a Dynamic Cuda Device in Ocelot. * Use Harmony performance prediction to determine which underlying device to execute each kernel on.
  3. Kernel Dependency Resolution * Build basic pointer analysis support into Ocelot. * Map cudaMalloc memory allocations to Harmony variables * Track dependencies between cuda kernels * Execute them with the Harmony Runtime
  4. Kernel Online Optimization * Use Harmony to predict the performance improvement of specific optimizations * Dynamically recompile PTX kernels using the best optimizations